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EMPLOYEE ATTITUDES TOWARD TECHNOLOGICAL 
CHANGE IN A MEDIUM SIZED INSURANCE 
COMPANY * 


EUGENE JACOBSON,? DON TRUMBO,? GLORIA CHEEK, ann JOHN NANGLE 


Labor and Industrial Relations Center and Department of Psychology, 
Michigan State University 


One important aspect of employee adjust- 
ment to technological change is the way in 
which the employee experiences the change. 
His perception of what is happening and how 
it affects him can be expected to influence his 
response to the change. When the employee’s 
experience of the change and his response to 
it is known, it is equally important to attempt 
to discover determinants of these 
phenomena. 

This paper is the first in a series reporting 
on a set of studies designed to explore the 
effect of supervisory practices, communica- 
tion, employee personality, and employee his- 
tory on employee response to change. In it 
we discuss some initial findings about office 
workers’ response to the introduction of a 
computing machine. Other reports will pre- 
sent material about possible determinants of 
these responses. 


possible 


Technological Change in the 
Office Situation 


The basic assumption of these studies is 
that change is always occurring in work situa- 


1 The research reported is part of a series of proj- 
ects conducted in the Labor and Industrial Relations 
Center at Michigan State University under the guid- 
ance of Jack Stieber. Einar Hardin, Economics De- 
partment, cooperated with the authors in all phases 
of the study and is reporting, in other publications, 
on job changes and employee response. Doctoral 
dissertations based on these studies have been pre- 
pared by the junior authors under the supervision 
of James Karslake, Department of Psychology. 

2Dr. Jacobson is on leave from Michigan State 
University as Chief, Division of Applied Social Sci 
ences, Department of Social Sciences; UNESCO. 


tions. In the contemporary office in the United 
States the increased use of computing ma- 
chines is making the study of change perhaps 
even more relevant than the corresponding 
introduction of automated work processes in 
the factory, where there is a longer history of 
adaptation to these devices. 

Even in the larger offices, there has been 
some opportunity for experimenting with the 
new equipment. But it is only recently that 
the smaller companies have begun to use the 
complex data handling machines and the 
smallest still are not able to afford this equip- 
ment. When a medium sized company that 
has been using traditional data processing 
methods installs some of these machines it 
might be expected that employees would be 
very much aware of the change. It is this 
kind of situation—a medium sized insurance 
company using its first electronic data com- 
puting and storing procedures—that we chose 
for our first study of employee response to 
change. 


The Research Site 


The company employs about 500 persons, 
300 of them housed in a single central home 
office building, all engaged in activities re- 
lated to selling and servicing insurance poli- 
Eighty percent of the nonsupervisory 
Half of the women 


cies. 
employees are women. 
Dr. Trumbo is now on the staff of the Depart- 


ment of Psychology, Kansas State College, Manhat- 
tan, Kansas. 
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are married. Sixty-five percent of the non- 
supervisory employees are less than 35 years 
old. bout half of the nonsupervisory em- 
ployees report that they are neither the only 
wage earner in the family nor the main wage 
earner and that their household could live 
adequately if they were not working. Al- 
most all of the employees have lived in the 
same geographical area most of their lives. 
Almost all have high school educations, and 
about a quarter have some additional formal 
education. Forty percent of the nonsuper- 
visory employees had worked for the com- 
pany for a year or less at the time of the 
study. 

The company is located in a city of about 
100,000 population, mixed industrial, com- 
mercial, and government with a large num- 
ber of offices employing clerical workers. The 
company has a reputation as a good place to 
work, a new modern building, and many bene- 
fit plans. It maintains pay scale at commu- 
nity level, has had a continuously expanding 
work force with no layoffs, and is perceived 
as a prosperous and growing organization. 

The company was organized in the early 
1900’s, had a relatively slow growth until the 
middle of 1940’s when it began to expand 
rapidly and is still increasing in size. Until 
the early 1950’s the office was operated along 
traditional lines. There were some office ma- 
chines used but they were not central to the 
entire work operation. As the business ma- 
chine companies produced more and more 
elaborate data processing devices, the com- 
pany became more self-conscious about its 
work procedures and has had a history of re- 
examining work flow and adjusting methods 
to allow maximum use of the new equipment 

But all of this change was at a relatively 
slow rate as compared with the changes in- 
troduced in August 1956, 
sized electronic computing 
stalled. 


when a medium 


device was in- 
This machine can store and selec- 
tively reproduce data, perform a series of 
complex, related operations, and provide data 
for other machines. It is the response to the 
installation of this major technological inno- 


vation that we studied in February 1957. 
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Procedure 


Before and during the period of installation of the 
computer we had regular contacts with the com- 
pany, talking with the staff and line people who 
were planning and supervising the change. The ma- 
chine was physically in the company’s offices in July 
1956. By December 1956 it had gone into produc- 
tion on a major processing operation. By the end 
of February 1957 the computer had been in use for 
about three months. 

At that time we administered pencil-and-paper 
questionnaires to all of the home office employees 
below the level of the Board of Directors. About 
230 nonsupervisory and about 50 supervisory em- 
ployees, assembled in three groups on two succes- 
sive days in the company’s meeting room, filled out 
the hour long questionnaires. This is about 85% of 
the total home office population. An examination ‘ 
of nonresponse showed no significant biased loss 

Questionnaire material included items on response 
to change, response to the computer, supervision, 
job satisfaction, communication, and employee back 
ground and personal history. The questionnaires 
for the supervisory and nonsupervisory employees 
were essentially identical, except for items on super- 
vision. These framed in a complementary 
fashion, so that employees were responding about 
behavior of their supervisors and supervisors were 
responding about their own behavior 


were 


Results 


From these questionnaire materials we have 
selected four facets of the nonsupervisory em- 
ployees’ response to change for discussion: 
the employees’ perception of the general im- 
pact of the installation of the new computer 
on their jobs, the employees’ perception’ of 
the impact of machines in general on the of- 
fice situation and jobs, the employees’ gen- 
eral attitudes toward technological changes, 
and the employees’ perception of what is hap- 
pening to a number of specific aspects of their 
jobs because of technological change. 

In broad summary, these findings indicate 
that the bulk of the employees are sensitive 
to the change that has occurred, see it as hav- 
ing important effects on their own jobs and 
on the opportunities for employment in their 
occupational field. They recognize that work- 
ers are being replaced by machines but do 
not feel that they themselves will be affected. 
About a quarter of the employees feel that 
new developments in technology are taking 
place more rapidly than is desirable, but they 
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themselves welcome change in their own jobs. 
They report that the kinds of changes that 
are traceable to the new equipment have to 
do with amount of work and variety and in- 
terest of the work, but do not believe that 
pay, promotion, or supervision have been af- 
fected by the change. 

In detail, the findings are as follows: 

How did the changeover affect the em- 
ployee? About one employee in three reports 
that the changeover to the computer had a 
relatively marked cifect on his job. Two per- 
cent say they were promoted, 4% that they 
were transferred, and 27% that they kept 
the same jobs but the work was noticeably 
changed (Table 1). 

About 6% report dislike of the effect of 
the changeover, about one-half report that it 
made no difference to them, and 40% that 
they like the effect of the changeover. 

About one employee in three reports that 
the effect of the changeover was quite dis- 
rupting, and the remainder that it was only 
slightly or not at all disrupting. 

When asked to anticipate the effect of the 
computer on their jobs in the next year or 
two, about 40% of the employees see that it 
is probable that the computer will have some 
influence on them. About the same percent- 
age do not believe that the computer will 
affect their jobs. 


Table 1 


What Effect Did the Changeover (to the New 
Computer) Have on Your Job? 


Percentage 


was promoted 2 
was transferred to another job 4 
kept the same job, but the work 

was greatly changed 

kept the same job, but the work 

was noticeably changed 

kept the same job, and the work 

was slightly changed 

kept the same job, and the work 

was not changed 


NA 


Table 2 


Are the Chances That a Machine Will Replace You on 
Your Job Greater or Less Than for Most Jobs? 


Percentage 
Much or somewhat greater than 
for most jobs 
Somewhat less than for most jobs 


Much less than for most jobs 


NA 


What are machines doing to jobs? About 
one quarter of the employees believe that 
machines have changed the nature of their 
jobs to a fairly large extent in the past two 
years. An additional 47% perceive some 
change in their work because of machines 
This does not necessarily mean that the em- 
ployee is reporting a change in task. He 
may be reflecting what he senses to be 
changes in how his job fits in with others or 
the general work atmosphere. Eighty percent 
of the employees who perceive quite a bit of 
machine induced change say that they like it. 

About 1 in 10 believes that machines have 
replaced workers to a large extent in insur- 
ance companies in the past two years. An- 
other 60% say that machines have replaced 
workers in insurance companies to some ex- 
tent. About half of those who see this dis- 
placement occurring express themselves as be- 
ing indifferent to it, about a quarter approve 
and 1 in 5 disapproves. 

But although three quarters of the em- 
ployees believe that machines have replaced 
workers in insurance companies, about 80°% 
feel that the chances that they themselves 
will be replaced by machines are less than for 
most jobs (Table 2). 

Those who foresee least likelihood of their 
being replaced are more happy with their 
predictions than the others. 

When asked to estimate what will happen 
to the total number of people doing their kind 
of job in the next five years, about 4 in 10 
see an increase, another 40% see the number 
remaining about the same and only 1 in 8 
sees the number of people doing his kind of 
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Table 3 


lhe Job That You Would Consider Ideal for You Would 
Be One Where the Way You Do Your Work: 


Percentage 


Is always the same 3 
Changes very little 7 
Changes somewhat 44 
Changes quite a bit 28 
Changes a great deal 18 
NA 


work decreasing. Those who do see decreased 
opportunities in their kind of employment are 
less happy. 
What does the employee think about tech- 
nological change? As a general comment on 
the impact of machines on jobs, about 1 em- 
ployee in 4 believes that new developments 
in machines and methods for doing work are 
taking place more rapidly than is desirable. 
About half feel that the rate is satisfactory 
and only 1 in 6 feels that it is too slow. 
When asked what kind of job would be 
ideal for them, only 1 in 10 reports that he 
wants a job where the way the work is done 


changes only a little or not at all. Seventy 
percent want the work to change ‘‘somewhat”’ 
or “quite a bit,” and 1 in 5 wants the work to 
change a great deal (Table 3). 

Those employees who believe that more 
changes take place in the way they do their 
jobs than is true for the average task are 
more likely to say that they approve of this 
state of affairs than are the employees who 
say that their jobs tend to have fewer changes. 

About 40% of the employees would like to 
have a large part or all of their work involve 
the use of office machines. One in eight would 
prefer not to have his work involve the use 
of machines. 

About 40% of the employees do not see 
that their kind of job will require any more 
use of machines by 1960. However, half of 
the employees do see that their jobs will re- 
quire more use of machines and almost none 
reports that their jobs will require less use of 
machines. 

What aspects of the job has the computer 
affected? In Table 1 we found that about 
one third of the nonsupervisory employees in 
the company studied believed that the com- 
puter had created significant changes in their 
jobs. To determine the kinds of changes the 
employees believed were taking place, ques- 


Table 4 


Has This Aspect of Your Job Changed in the Past Year? 


“Ves, more now”: 


The amount of variety in my work 

The degree to which my work is interesting 
The amount of work required on this job 

The amount of responsibility demanded on this job 
The amount of skill needed on this job 

The degree of accuracy demanded by this job 
The amount of security I feel 4n this job 

My chance for promotion to a better job 

The extent to which T can pace my own work 
The amount of pay I get on this job 

The amount of supervision I get 


Did the changeover to the computer affect 
this aspect of your job? 


Total 


Percentage 


Nm 


44 
44 
43 
40 
38 
38 
34 
32 
29 
29 


14 


— mh hw = 


~— = ND 
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tions were asked about 11 aspects of the work- 
ing situation. For each of these 11 aspects, 
three questions were asked: 


“Has this aspect of your job changed in the past 
year?” 

“How do you feel about this change (or lack of 
change) in your job?” 

“Did the changeover to the computer affect this 
aspect of your job?” 


Because the bulk of the employees, about 
two thirds, did not report any major change 
in their work, we will first examine the dif- 
ferences in “no change” answers among the 
11 aspects of the job. 

Employees are more likely to report that 
there has been “no change” in the amount of 
supervision they receive, the amount of pay 
they receive, and their chances for promotion 
than that there has been “no change” in the 
amount of work they do. 

When asked directly whether the change- 
over to the computer had affected these as- 
pects of their jobs, an even larger percentage 
responded “no.’”’ When we examine these 
“no” responses we find roughly the same or- 
dering. Employees are more likely to say 
that the computer did not affect the level of 
their pay than that it did not affect the 
amount of work they do or the variety in 
their work. 

Turning to the employees who did report 
change, in Table 4 we have an analysis of the 
response “yes, there is more now” for each of 
the 11 aspects of the job. First, we find that 
employees are more likely to report an in 
crease in the variety of their work, the de- 
gree to which it is interesting, and the amount 
of work than they are to report an increase 
in amount of supervision or pay. 

But there is no simple relationship be- 
tween perceived increase in these aspects and 
imputed effect of the computer. The rela- 
tively small number of persons who report a 
pay increase in general do not attribute it to 
the introduction of the computer. The rela- 
tively large number who see that there is 
more variety in their work do attribute it to 
the computer. Most of those who report 
more security do not feel that the computer 
is responsible. In the other aspects, about 
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half of those who report change attribute it 
to the computer. 

Among the small number of persons who 
report decreases in various aspects of their 
jobs, the computer is slightly more likely to 
be credited with decreases in amount of work, 
and slightly less likely to be credited with de- 
creases in amount of supervision. 


Summary 


Questionnaires about technological change 
and the installation of a new electronic com- 
puter were administered to all of the em- 
ployees of a medium sized insurance company. 

About one third of the nonsupervisory em- 
ployees reported that the introduction of the 
computer had affected their jobs. Most of 
the employees welcomed changes in their 
work, although they thought that changes 
were taking place somewhat too rapidly in 
the world in general. Most of the employees 
like to work with machines and expect more 
use of machines in the future. They believe 
that machines are replacing workers in office 
situations but do not feel that they them- 
selves will be replaced. They do not perceive 
that the introduction of the new technologies 
has had much effect on the amount of pay 
they get, their chances for promotion, or the 
amount of supervision they receive. But they 
do believe that the new technologies have 
changed the amount of work that they do and 
the degree to which there is variety in their 
work. 
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A STATISTICAL EVALUATION OF EDWARDS 
PERSONAL PREFERENCE SCHEDULE ' 


EDWARD LEVONIAN, ANDREW COMREY, WILLIAM LEVY, ann DONALD PROCTER 


University of California, Los Angeles 


The Edwards Personal Preference Schedule 
(PPS) has been used widely in both the ap- 
plied and research fields as a means of meas- 
uring 15 “normal” personality variables (Ed- 
wards, 1954). To obtain information about 
the factor structure of the PPS, factor analy- 
ses were carried out for the items within each 
scale. The results of these 15 factor analyses 
constitute the subject of this report. 

The PPS was designed to measure the 
strength of personal need in the following 
scale variables: Achievement, Deference, Or- 
der, Exhibition, Autonomy, Affiliation, Intra- 
ception, Succorance, Dominance, Abasement, 
Nurturance, Change, Endurance, Heterosexu- 
ality, and Aggression. The test consists of 
225 items, each of which contains two alter- 
nate statements from which the subject is 
supposed to choose the one more nearly char- 
acterizing himself. The statements are sup- 
posed to be equal in social desirability. Fif- 
teen items occur twice to allow a check on 
respondent consistency. 

Except for the duplicate consistency items, 
each item is scored on two of the 15 scale 
variables. Thus, 29 items are scored for any 
given scale, but 28 of these items are also 
scored for another scale, two items for each 
of the other 14 scales. Each of these items 
is so phrased that one of the two available 
responses is scored positively for the scale in 
question, whereas the alternate response is 
scored positively for the other scale variable 
upon which the item is scored. This system 
gives the respondent a “forced choice” in 
which he can appear high (or low) in some 
but not all variables. 

Apart from the 15 identical consistency 
items, other items are related to one another 
by virtue of being half identical. An item 
statement may appear in several items but in 


1The authors are indebted to J. R. Marshall, 
Frances S. Taylor, and Hilde Groth, who partici- 
pated in the early stages of the research but not suffi- 
ciently to warrant coauthorship. 


2ec 
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opposition to a different alternative state- 
ment in every case, except for the consistency 
items which repeat both statements. Any 
two items in this test, therefore, can be clas- 
sified into one of five kinds of relationships 
that can have an important effect upon fac- 
torial structure. The five possible relation- 
ships are: 

Reciprocal consistency pair. When two 
items bear this relationship to each other, 
they are identical. Each scale has two and 
only two such items, e.g., Items 1 and 151, 
resulting in one and only one such pair. The 
pair is reciprocal because both items offer a 
choice between the same two scale variables. 

Reciprocal iterative pair. Items in this 
group form pairs which offer the same choice 
between scale variables to the respondent, 
i.e., Achievement vs. Order, but in addition 
are half identical. That is, two such items, 
e.g., Items 3 and 11, will include a common 
statement between them, although the other 
statement will be different in the two items. 
Fifty-two such pairs appear in the test. 

Reciprocal diverse pair. These items offer 
a choice between the same two variables but 
do not share a common statement. All four 
alternative statements in the two items, e.g., 
Items 2 and 6, are different. Fifty-three such 
pairs exist in the test. 

Nonreciprocal iterative pair. These items 
share a common statement but do not present 
a choice between the same two scale vari- 
ables. Due to the overlapping statement, one 
of the variables will be the same in both 
items, e.g., Items 6 and 27, but the opposed 
variable differs as well as the statements. 

Nonreciprocal diverse pair. A pair in this 
category shares common statement, nor 
does it offer the same variable choice for the 
two items, e.g., Items 2 and 18. 

From a consideration of what the PPS is 
supposed to measure and certain aspects of its 
composition, the following results might be 
expected: 


ho 
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1. Substantial interitem correlations within 
scales should emerge. The 29 items for each 
scale are supposed to measure that scale vari- 
able, hence they should show substantial in- 
tercorrelations, resulting in one factor com- 
mon to all items. 

2. Fourteen doublet factors should emerge. 
Since every item on a scale is also scored on 
some other scale and there are two items for 
each other scale, a factor should appear for 
each pair of items measuring the same scale 
variable other than the one being analyzed. 
This would give 14 doublet factors on each 
scale. 

3. A consistency factor should emerge. 
Variance specific to the two identical items 
should appear as an essentially specific fac- 
tor on each scale. 


Procedure 


To test these expectations, each of the 15 scales 
was factor analyzed independently of the other 
scales.2. Data consisted of the responses of the first 
360 Ss chosen as every fourth case from the 1509 
cases in the original normative sample.? Phi coeffi- 
cients were used in the correlation tables since phis 
have been found to yield more satisfactory results 
in factor analytic work than other commonly used 
point coefficients (Comrey & Levonian, 1958). Fac- 
tors were extracted by the complete centroid method 
until at least three successive factors failed to yield 
a loading as large as .25. These factors were rotated 
analytically using Kaiser’s (1958) normal Varimax 
method, an orthogonal method which tends to maxi- 
mize the variance of the squared extended vector 
projections by pairs of factors over all possible pairs 
Iteration is continued until an acceptable converg- 
ence of the solution has been achieved. No hand 
rotations were made. 


Results 


It will be impossible to present even in 
summary form the actual results of 15 factor 


analyses in so little space. Only certain out- 


standing. features and general implications of 
the results will be treated. 


Those interested 


* All computations were performed on SWAC, an 
electronic digital computer operated by Numerical 
Analysis Research at the University of California, 
Los Angeles, and supported by the Office of Naval 
Research. The opinions expressed here are the au- 
thors’ and do not necessarily refiect those of the 
United States Navy. 

8 Professor Edwards kindly made available the re- 
sponse sheets of the 360 Ss 
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in more, details may examine the complete 
documents available through the ADI.‘ 

First of all, the intercorrelations between 
the items within scales are generally small. 
As an example of the interitem correlations 
encountered, the correlations for the’ first 
scale (Achievement) are given in Table 1. 
While most of the correlations are positive, 
the average correlation is less than .08, not 
significant. 

The interitem correlation table for the 
achievement scale is typical of the correlation 
tables for the other 14 scales. Although the 
Kaiser Varimax method of factor rotation 
does not tend to favor the emergence of a 
general factor, it is doubtful if any method of 
factor analysis would give a very strong gen- 
eral factor from tables of intercorrelations 
such as these. 

The second major finding showed that in- 
stead of having 14 doublet factors for each 
scale determined by reciprocal pairs of items 
supposedly measuring the same scale vari- 
ables, as was expected, factors were largely 
determined by the repeated statement. There 
are 105 reciprocal pairs of items in the PPS, 
each occurring in two analyses. Of these 210 
pairs, 104 are of the iterative type, sharing a 
common statement, and 106 are of the diverse 
type, not sharing a common statement. While 
73 iterative pairs of items appeared together 
on the same factor, only 21 diverse ,pairs did 
so. Of the 161 factors with two or more 
loadings of .30 or more, only 30 contained no 
common statement among their items. Only 
half the factors were loaded by reciprocal 
pairs and three-fourths of these were of the 
iterative type. Only about 10% of all fac- 
tors were loaded by reciprocal diverse pairs, 
and these were outnumbered 2 to 1 by fac- 
tors which were loaded by nonreciprocal itera- 
tive pairs. 

*The following tables have been deposited with 
the American Documentation Institute: (1) item in 
tercorrelations for each of the 15 scales, (2) rotated 
factors for each of the 15 analyses, (3) intercorrela 
tion among reciprocal iterative and reciprocal diverse 
pairs, and (4) factors loaded by reciprocal pairs. 
Order Document No. 6078 from the ADI Auxiliary 
Publications Project, Photoduplication Service, Li 
brary of Congress, Washington 25, D. C., remitting 
in advance $2.25 for 35 mm. microfilm or $5.00 for 
6 X 8 in photocopies readable without optical aid 


Make checks payable to Chief, Photoduplication 
Service, Library of Congress. 
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Table 2 


Comparison of Factor Loadings for Two 
Methods of Scoring 


Alternative 
Scored as 1 Factor Loadings 
Con Scoring 
sistent Key 


Con Scoring 


Item sistent Key 


\ 64 63 
A 60 64 
\ 42 
\ 
\ 


The third expectation, namely that a con- 
sistency factor would appear, was verified in 
every scale. The consistency factor, however, 
instead of taking only the specific variance 
left over, was usually one of the major fac- 
tors in the analysis. 

These factor results, and the ADI tables, 
are based on an item scoring method in which 
the B alternative to each item was scored as 
1, but essentially the same results would be 
obtained if the items were scored according 
to the key issued with the EPPS. For each 
scale this key scores the A alternatives as 1 
for half the items, the B alternatives as 1 for 
the remaining half. To demonstrate the rela- 
tive unimportance of the scoring method em- 
ployed, the first scale (Achievement) was fac- 
tor analyzed again using key scoring. Table 2 
gives the comparative results for the five items 
with the largest loadings on a corresponding 
factor for both types of scoring: (a) the con- 
sistent method with the B alternative consist- 
ently scored as 1, and (&) the scoring key 
method. The alternative scored as 1 is un- 
derlined. It is seen that the main effect of 
reversed scoring is to change the sign of the 
factor loading; however, the interpretation of 
the factor remains the same, since this inter- 
pretation takes into account the manner of 
scoring. 

In order to gain some understanding of the 
basis for the anomalous factor results, the in- 
teritem correlations were investigated accord- 
ing to the five different types of item pairs. 
The order of magnitude of phis from top to 
bottom for the various types of pairs was: 
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consistency pairs, reciprocal iterative pairs, 
nonreciprocal iterative pairs, reciprocal di- 
verse pairs, and nonreciprocal diverse pairs. 
The median values were, respectively, .50, 
30, .21, .17, and .00. The fact that the 
median correlation was greater for iterative 
items not supposed to be measuring the same 
two variables than for the reciprocal diverse 
items, which are supposed to be measuring 
the same two variables, closely parallels the 
factor results. The low values of the cor- 
relations in general and particularly for the 
identical consistency items should be noted. 
The range of phi coefficients for the identical 
consistency items was .30 to .68. No non- 
identical item pair had a phi larger than .50. 


Discussion 


The results of these analyses reveal an 
unexpectedly large discrepancy between what 
the PPS is designed to measure and the actual 
item factorial content. Instead of finding 
large factors which are readily identifiable 
along the lines of the major variables scored 
in the test, one finds a large number of nar- 
row factors, the majority of which seems to 
be based upon shared common statements. 
Furthermore, the correlations are low between 
items which are supposed to measure the 
same variables. Even the same item repeated 
later in the test results in a median correla- 
tion of only .50. 

In the opinion of the authors the failure of 
the EPPS to give the expected factor results 
stems from: (a) using the same item state- 
ment in several different items, (4) scoring 
the same item on two scales, and (c) using 
the forced-chgice item form with equated so- 
cial desirability of the item statements. The 
first practice introduces significant amounts 
of overlapping specific variance and limits 
the sampling of trait indicators. The second 
practice also introduces overlapping variance 
into the scale scores. No item should be 
scored on more than one scale, unless perhaps 
as a suppressor variable. It is difficult for 
two scales to be independent of one another 
if they share the same items. 

The third and perhaps most serious diffi- 
culty lies with the use of forced-choice items. 
The basic form of the PPS item is one that 
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encourages low reliability of response. The S 
must choose which of two statements seems 
more descriptive of himself, yet the choice is 
made more difficult by equating the state- 
ments for social desirability. Sometimes the 
choice is difficult because two statements 
seem about equally applicable. At other times 
the choice is difficult because the two state- 
ments seem about equally inapplicable. The 
test situation tends to maximize the number 
of difficult, and hence unreliable, choices for 
the S. Even for a conscientious respondent, 
it is difficult to be accurate and consistent 
under such circumstances. Less careful indi- 
viduals easily develop a negative attitude to- 
ward the test situation, which promotes care- 
lessness, further reducing tne reliability of 
response. 

The PPS has adopted this forced-choice 
form for the purpose of avoiding respondent 
tendency to present a good picture of himself. 
Whereas this is a laudable objective, it does 
not seem to have been attained without ex- 


cessive cost, if at all. Item form should make 
it as easy as possible for the respondent to ex- 
press himself and his position as exactly as 
possible, truthfully or not. Whether or not 
the individual is answering truthfully, or giv- 
ing himself the benefit of the doubt, should 
be determined by other methods and this in- 
formation used in evaluating the test results. 
Attempts to force truthfulness by special item 
forms seem likely to succeed principally in 
reducing item reliability and validity to the 
point where the test has questionable utility. 
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One of the most serious and_ persistent 
problems confronting the life insurance indus- 
try is that of recruiting and selecting agents 
who will become successful “career” salesmen. 
Current attrition rates among new hires are 
close to 75-80% at the end of a three-year 
period. This great turnover. of personnel 
represents considerable costs to the industry 
in the continual recruitment, training, and 
financing of new agents, and in the financial 
support, over a period of time, of agents who 
are not likely to sell sufficient premiums to 
cover the company’s investment in them. 

The present study is part of a long range 
investigation by one company conducted for 
the purpose of evaluating existing screening 
procedures in the selection of life insurance 
agents and in developing efficient predictors 
of likely failures. 


Subjects 

Ss for this study male life insurance 
agents who were hired by one company between 
September 1, 1950 ard December 31, 1954. These 
agents were all financed by the company (i.e., hired 
on a salary or advance basis). This sample com- 
prises the great majority of full-time financed agents 
hired by the company between the time periods 
specified above. For all, at least a three-year pe- 
riod had elapsed since the time of hire at the date 
the study was designed in early 1958. All 522 agents 
were originally selected from a large applicant popu 
lation on the basis of results obtained on the Ac 
tivity Vector Analysis (AVA) and personal history 
data reported on a confidential questionnaire. The 
selections were made, however, on a rather informal 
basis since no definite criterion cut-offs were set 
Only guide lines were established for the general 
agents who, in the final analysis, exercised their pre- 
rogatives as to whom to hire or not hire. Neverthe- 
less, there is reason to believe that the personality 
profiles were given more weight than the personal 


were 522 


1 Further acknowledgment is gratefully made for 
the helpful suggestions and criticisms given by Alice 
L. Palubinskas of the Psychology Department, Tufts 
University, who was research consultant on this study. 


history variables. The underlying reason for this as- 
sumption is that an integrated personality profile de- 
termined to be “best” for life insurance salesmen was 
being used by the company during the period in 
which the subjects were hired whereas no such pro- 
file was determined for the personal history vari- 
ables. A survey (Walter V. Clarke Associates, Inc. 
Staff, 1955) of the general agents attested to the 
relative weights of these two sets of data attached 
by them in their consideration of new applicants. 
As a result the subjects of this study are more highly 
restricted in their range of personality profiles than 
they are on the personal history measures 


Predictors 


The two sets of predictors of success as life insur- 
ance salesmen employed in this study were the per- 
sonality inventory and certain personal history vari 
ables obtained from a locally prepared questionnaire 
as indicated above. The Activity Vector Analysis 
(AVA) is a self concept personality assessment in- 
strument. It is widely used in industry in the clas- 
sification and selection of personnel at all levels of 
employment. The details of the construction and 
application of the AVA have been published by 
Clarke (1956b). Reliability and validity studies on 
this inventory have been reported by Clarke (1956a; 
1956c), Hammer (1958), Lundin (1957), Merenda 
(1959a; 1959b), Musiker (1958), and Whisler (1957) 
The personal history questionnaire covers the fol 
lowing 20 areas of vital statistics, training and ex- 
perience: (1) age, (2) marital status, (3) number 
of children, (4) military status, (5) educational level, 
(6) percentage of educational expenses earned, (7) 
number of organizations of which a member, (8) 
number of held, (9) number of 
previous work experience, (10) length of stay at 
present residence, (11) dollar amount of unearned 
monthly income, (12) dollar amount of outstanding 
debts, (13) dollar amount of life insurance purchased 
for self, (14) dollar amount of minimum monthly 
living expenses, (15) attendance at sales and/or non 
sales courses, (16) employment status of life, (17) 
type of recreation in which normally engaged, (18) 
previous experience, (19) total number of 
friends, (20) number of friends in professional and 
executive/managerial class 


offices years of 


sales 


These data were all re 
corded on the date of application for employment of 
each of the agents of this study. The AVA was also 
administered to each at that time. 





. . . . . . « 
Predictive Variables in' Success of Life Insurance Agents 


Criteria 


For the purpose of evaluating the prediction va- 
lidities of the AVA and the personal history vari- 
ables as selectors of life insurance salesmen, the fol- 
lowing criterion standards were set for each subject 
at the expiration of his third year after first employ- 
ment. 


Success 


A successful agent is one who: 

1. Meets his Training Allowance Program quotas 
or achieves $200,000 production in his first year and 
at least $300,000 in either his second or third year; 
or 

2. Is advanced to a supervisory or management 
position within the company; or 

3. Leaves the company to become an agent, su- 
pervisor, or general agent of another company be- 
fore the end of the third year if he achieves the pro- 
duction goals outlined above. 


Failure 


An unsuccessful agent is one who: 

1. Fails to reach the production goals outlined, 
whether or not he remains as an agent with the com- 
pany; or 

2. Has had his contract terminated by the com- 
pany; or 

3. Leaves the insurance industry within a three- 
year period. 

The sample of 522 agents was dichotomized on 
these three-year criterion standards. A total of 414 
agents were classified as “unsuccessful” and only 108, 
as “successful.” It will be noted that a very high 
failure rate (4 out of 5) exists for the agents of this 
study. 

A further criterion in the form of new business 
volume (face value of insurance policies) at the end 
of the first, second, and third years was available 
for these agents 


Procedure 


The AVA and the personal history variables 
were studied separately as to their relative 
predictive efficiencies in determining the suc- 
cess or failure of the life insurance salesmen 
of this study. The results obtained by em- 
ploying the AVA alone have been published 


as a separate report (Merenda, 1959a). For 


both sets of predictors discriminant analysis 
was applied to the problem of providing 
maximum separation between the successful 
and unsuccessful agents of the sample of 522 


life insurance salesmen. The resulting dis- 
criminant functions were tested for statistical 
significance and the empirically determined 
weights for each battery in this initial valida- 
tion study were used to build relative fre- 
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quency distributions of discriminant scores 
for the purpose of establishing definite “go’’- 
“no go” criterion cut-offs. 

The four-factor AVA resultant profile was 
used as the temperament measure set. For 
the personal history set only 5 of the 20 vari- 
ables were used in the battery. These five 
were the only ones which individually differ- 
entiated the two groups. They are: (1) num- 
ber of children, (2) educational level,’ (3) 
number of offices held, (4) dollar amount of 
minimum monthly living expenses, and (5) 
dollar amount of life insurance purchased for 
self. For the other fifteen personal history 
measures, the frequency distributions for suc- 
cessful and unsuccessful agents were either 
nearly completely overlapping or else the dif- 
ferences in means and variances were not sta- 
tistically significant. 

Pearson-type correlation coefficients were 
computed between the four individual AVA 
resultant variates expressed as ordinary stand- 
ard scores with mean = 50 and o = 10 and 
the five personal history measures converted 
to normalized T scores. Comparisons were 
also made of the relative efficiencies of the 
individual variates in predicting the di- 
chotomy. Finally, an analysis was made of 
the power of the two sets, used as independ- 
ent screens and employing variable cut-offs, 
in rejecting likely failures as life insurance 
salesmen at the end of three years after first 
hire. 


Results and Discussion 


The results of discriminant analysis of the 
AVA profile data are summarized in Tables 1 
and 2. Resultant AVA profile shape was 
used as the predictor variable. Standard 
scores for the four principal AVA vectors 
were transformed to deviation from 
the composite mean in order to remove the 
effect of Activity level (total number of 
words checked). Hence, the profiles consti 
tuting the set of discriminant variates in this 
study were expressed as sets of deviations 
about the individual S’s mean. 

The data of Table 1 show that the suc- 
cessful agents possessed significantly higher 
scores on AVA Vectors 1 and 2 (aggressive- 
ness and sociability) and significantly lower 
scores than the unsuccessful agents on AVA 


scores 
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Table 1 
Comparison of AVA Resultant Profiles (4 Vectors) for Three Year Successful and Unsuccessful 
Life Insurance Agents 


Successful Agents 


(N = 108 (N = 414 
Variate Z a X Ox 
V-1 6.88785 8.29838 4.59277 8.65429 
V-2 10.81308 7.43325 8.94457 8.11598 
V-3 8.81308 6.57478 —6.80722 6.74772 
V-4 9.48598 5.96032 7.27228 6.24560 


Vectors 3 and 4 (emotional! control and so- 
cial adaptability). This pattern differentia- 
tion is consistent with the hypothesized “best” 
profile for life insurance agents. Table 2 
gives the appropriate discriminant weights 
for maximizing the separation between the 
two groups. The analysis of maximum sepa- 
ration yields an F value which is statistically 
beyond the .05 level. 

The results of discriminant analysis of the 
set of personal history variates are summa- 
rized in Tables 3 and 4. The data of Table 3 
show that with the exception of “number of 
offices held” in which the difference is not 
statistically significant, the successful group 
possessed significantly higher scores on these 
measures than did the lower group. The non- 
significant variable was retained because it 
showed relatively high differentiation at the 
end of one year and it was felt that if in- 
cluded as a predictor in the three-year analy- 
sis it would contribute, if only slightly, to the 
prediction when combined with the other 
variates in the battery. 

Table 4 gives the appropriate discriminant 
weights for maximizing the separation be- 


Table 2 
Four-Variate Discriminant Analysis Data for Resultant 
Profiles of Three Year Successful (V = 108) and 
Unsuccessful (V = 414) Life Insurance Agents 


Dis 


Difference in Means criminant 


Variate (Successful-Unsuccessful) Weights / P 
V-1 +-2.29508 + .OOOO18 
V-2 + 1.86851 + 000045 ’ = 
> - ee AY 05 
V3 2.00586 000058 
V-4 — 2.21370 000020 


Unsuccessful Agents 


Both 
— = 522) 

Y o ’ l i 
5.06130 8.63578 + .135 3.11 <.01 
9.33142 8.01503 +.118 2.71 <.01 
7.22222 6.76171 — .150 3.46 <.001 
7.72989 6.25280 179 4.15 <.001 


tween the two groups. The analysis of maxi- 
mum separation yields an F value which is 
statistically significant beyond the .001 level. 

Hence, from the data of the preceding tables 
it would appear that both sets do possess in- 
dividually significant validity in predicting 
the success or failure of life insurance agents 
at the end of three years. The next problem 
was to determine whether the predictive effi- 
ciency would be enhanced if these variates 
were combined into one battery or whether 
their use as two independent predictors would 
yield more efficient results. The answer to 
this question was forthcoming from a correla- 
tion matrix * which revealed the relative in- 
dependence of the individual variates of these 
two batteries. The correlational data showed 
that the personal history measures are not 
only independent of the personality variables 
but are also uncorrelated with each other. 
On the basis of these findings it was decided 
to employ these two predictor sets as inde- 
pendent batteries each with its own minimum 
cut-off score. 

Discriminant scores were calculated for 
each of the 522 Ss employing the weights re- 
ported in Tables 2 and 4. Then, since dif- 
ferentiation is independent of the units used, 
the weighted composite scores were multi- 
plied by the constant 10,000 in order to re- 
move unnecessary decimal places. The dis- 
tributions of these linear composite scores 

2The correlation matrix of resultant AVA and 
standardized personal history variables for the sam- 
ple have been deposited with the American Docu- 
mentation Institute. Order Document No. 6077 from 
ADI Auxiliary Publications Projects, Photoduplica- 
tion Service, Library of Congress, Washington, D. C., 
remitting in advance $1.25 for photocopies or $1.25 
for 35-mm. microfilm. Make checks payable to 
Chief, Photoduplication Service, Library of Congress. 
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Table 3 
Distribution and Serial Correlation Statistics for Standardized Personal History Predictor Variables of 
Three Year Successful and Unsuccessful Life Insurance Agents 
Successful Agents Unsuccessful Agents Both 
(N =108) (N =414) V=522) 
Variate x Oz xX Ox X oO, 


51.93519 
51.47222 
51.48148 
52.79630 


8.326094 
8.521727 
8.37820 
9.37366 


49.75604 
49.55556 
50.53865 
49 43237 


8.79810 
9.51455 
8.19579 
8.99429 


. Number of Children 50.20690 
49.95211 
50.73372 
50.12835 


8.71444 
9.35011 
8.24270 
9.17586 


-.126 2.91 
+.104 2.38 
058 1.33 
185 4.30 


. Educational Level 

. No. of Offices Held 

. Monthly Living 
Expenses 


. Amount of Insurance 52.44444 10.54827 49.20531 8.95557 49.87548 9.39949 4.03 


Owned 


* Not significant 


representing the AVA profiles yielded a mean 
of 13.08 and SD of 8.15 for the successful 
group, and a mean of 10.16 and SD of 8.02 
for the unsuccessful groups. For the total 
sample, the AVA discriminant scores ranged 
from —12 to +36. The linear discriminant 
score distributions for the personal history 
variates yielded a mean of 101.20 and SD of 
11.20 for the successful group, and a mean of 
95.63 and SD of 9.24 for the unsuccessful 
group. For the total sample, the personal his- 
tory discriminant scores ranged from 72 to 
124. 

The discriminant score distributions were 
grouped in regular intervals of 10, for each 
of the predictor sets and the probability of 
acceptance was calculated for each interval. 
These data are presented in Table 5. It will 
be noted that as the AVA scores begin to rise 
above 10 and the personal history discrimi- 
nant scores become greater than 100 there is 
a continued increase over chance (P = .20) 
in the probability of acceptance. On the 


other hand, as the AVA scores become nega- 
tive and the personal history scores go below 
90, the probabilities gradually diminish to sig- 
nificantly less than chance. These data sug- 
gest that the major discrimination by the 
AVA occurs in the negative discriminant 
score range. This result is as expected since 
Ss with extremely low scores are those pos- 
sessing profiles which are incompatible with 
the AVA pattern hypothesized to be “best” 
for life insurance salesmen. 

In determining the maximum predictive effi- 
ciency of the AVA and personal history meas- 
ures, various cut-offs were tried. The result- 
ing data were analyzed both from the stand- 
point of percentage of successful and unsuc- 
cessful agents who would have been rejected 
by these standards and gross new business 
made by these agents over the first three-year 
period after hire. These data are presented 
in Tables 6 and 7. The data of Table 6 re- 
veal that a cut-off score of zero for the AVA 
is a highly efficient standard since it would 


Table 4 


Five-Variate Discriminant Analysis Data for Standardized Personal History Variables of Three Year 


Successful (V = 108 


Difference in Means 
Successful-Unsuccessful) 


Variate 


. Number of Children 
. Educational Level 
. No. of Offices Held 


1 
2 
3 
4. Monthly Living Expenses 
5 


and Unsuccessful 


2.17915 
+-1.91666 
+-0).94283 
+-3.36393 
. Amount of Insurance Owned +-3.23913 


N = 414) Life Insurance Agents 


Discriminant 
Weights 


+ 000034 

00004 1 
+- 000014 
+ 000057 
+ 000048 
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Table 5 
Probabilities (of Acceptance) Based upon Linear 
Discriminant Function Analysis of AVA and 
Personal History Data for Validation 
Sample of Life Insurance Agents 


(N = 522 


Discriminant 
Score Interval 


AVA 


Probability 
of Acceptance* 


Personal History AVA Personal History 


30-39 
20-29 
10-19 


120-129 40 50 
110-119 30 34 
100-109 23 28 

0-9 90-99 18 17 
10 to 1 70-89 O5 15 
20 to —11 70-79 00 00 


® The level of chance is at .20 since the successful-unsuc« 
split was in the ratio of 1 to 4. 


have rejected 39 unsuccessful agents at the 
sacrifice of only 2 successful agents. When 
combined with personal history cut-off scores 
of 88, 91, and 92 the set remains a highly 
efficient predictor system since both reject a 
relatively small proportion of agents who 
were ultimately successful, and the incidence 
of overlap among the relatively high propor- 


tion of unsuccessful agents rejected is quite 


low. Inspection of the data of Table 7 dis- 
closes that the 121 agents rejected by the AVA 
cut-off of zero and personal history cut-off of 
88 produced over a three-year period a total 
of only $31,626,000 in gross new business or 
an average of only $87,100 per year. This 
amount of production is considerably below 
the $200,000 first year and $300,000 post- 
first-year standard established by the agency 
department of the company conducting this 


study. Due to the complex nature of the 
problem, it is not possible to determine pre- 
cisely or even estimate closely the net loss to 
the company occasioned by the employment 
of these 121 agents. However, when one 
stops to consider the initial training costs 
and the salaries paid to these financed agents 
for the periods of their employment with the 
company it becomes apparent that the reve- 
nue brought in by these salesmen was consid- 
erably less than the company expense of plac- 
ing and keeping them on the payroll. 

For cut-off of Aava — 0 and Apu = 91, there 
is a substantial rise in the number of unsuc- 
cessful agents who would have been rejected 
(45 or 11%). This increment is accom- 
panied by a somewhat smaller rate of change 
(9 or 8%) for the successful group. From 
the standpoint of new sales made over a three- 
year period by these agents who would not 
have been hired had these standards been en- 
forced at the time of hire, the 175 agents 
showed gross new business of $47,914,000. 
This compares with a total of $196,341,700 
worth of life insurance sold by the full sam- 
ple of 522 agents over this same period. 
Hence, for the first two sets of cut-offs 16% 
and 24%, respectively, of the total revenue 
were attributable to 23% and 33% of the 
agents of the study. Table 7 shows, how- 
ever, that as the cut-off scores are increased 
beyond this point, loss in total revenue (43% 
for Aava = 6, App = 92) would be substan- 
tial and presumabiy too great to be recouped 
entirely through more efficient standards of 
selecting salesmen. Accordingly, the set of 
Aava = 0, Apo = 91 was determined the most 
practical and efficient. 


Table 6 


Predictive Efficiency of Combined AVA and Personal History Variables in the Selection of 
Life Insurance Agents 


Number of Agents Failing to Meet Standards 


a ; AVA Alone PH Alone 
Discriminant Score 


Cut-offs for S J S U 


AVA (0 
AVA (0) 
. AVA (0) 
AVA (0) 
AVA (0 


PH 
PH 
PH | 
PH 
PH ( 


AVA + PH 


Percentage Rejected by These Standards 


U 3 Yrs. Successful 3 Yrs. Unsuccessful 
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Table 7 


Paid Production of Life Insurance Agents Failing to Meet Varying Selection Criterion Ste 


Discriminant 


Score Cut-offs Category 


AVA (0) PH (88 Successful 
Unsuccessful 


Potal 


Successful 
Unsuccessful 
Potal 


AVA (0) Successful 
Unsuccessful 
Total 
Success/ul 
Unsuccessful 
otal 
Successful 
Unsuccessful 


Potal 
* In thousands of 


Among the 15 remaining personal history 
variables which proved not to be individually 
discriminative in terms of differences in over- 
all frequency distributions, the factor of age 
appeared to possess some significant discrimi- 
natory power at both extreme levels. The 
data showed that persons who are near or 
over age 45 and near or below age 25 at the 
time of hire are not likely to succeed as life 
insurance salesmen at the end of three years. 
By setting the age limits, 25 > Yrs. > 45, an 
analysis was made of the number of addi- 
tional agents who would have been rejected 
by these criterion standards and who would 
have successfully passed the two other screens. 
It was found that with the criterion scores of 
Luvs. = 9, 91, which were judged to 
be the most efficient, an additional 32 unsuc- 
cessful agents would have been rejected at 
the sacrifice of only 3 successful agents. The 
new business volume of these 35 individuals 
over a three-year period was $10,554,000 or 
an annual average of $100,419 per agent. 
This figure is far below the accepted stand- 
ard of required performance. Hence, there 
appeared to be ample evidence to conclude 
that applicants for life insurance agent who 
are less than 25 years or more than 45 years 


Apa > 


indards 


Paid Production* 


Ist Yr. 2nd Yr srd V1 Potal 
2,740 
9,310 
12,050 


13,299 
18,327 


31,626 


4,531 6,028 
5,479 3,538 
10,010 

4,775 

13,083 

17,858 


22,023 
25,891 
47,914 

5,695 
14,246 
19,941 


10,310 , 27,208 


28,194 


55,402 


8.565 
18,875 
14,409 
9,616 
24,025 


39 182 
31,031 
70,212 
17,218 
11,918 


29.136 


$7,567 
37,849 


85,416 


18,948 
31,024 


old are not likely to succeed in selling life 
insurance over a sustained period of time. 
When the age criterion variable with these 
two critical cut-off scotes' was combined with 
the multiple cut-off set of Asya = 0, ApH = 91, 
it was found that an increase of 3% of the 
proportion of rejection of successful agents 
resulted but that the rejection of unsuccess- 
ful agents was increased by 7%. When these 
figures are analyzed in terms of total num- 
bers in each group (22 for successful, 188 for 
unsuccessful) the gain in predicted efficiency 
for the total set is highly significant. 


Summary and Conclusions 


Temperament characteristics as measured 
by the AVA and also various personal history 
measures were investigated as to their predic- 
tive efficiency in the selection of life insur- 
ance salesmen. A total of 522 financed male 
agents employed full time in selling life insur- 
ance were studied three years aiter hire 

The findings of this study disclose that ap- 
plicants for life insurance agent are not likely 
to: be successful in selling life insurance over 
a sustained period of time if, temperament- 
wise, their self-perceptions are as passive and 
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submissive individuals rather than aggressive 
and socially confident persons. 

The findings also reveal that personal-so- 
cial data are only of limited value when con- 
sidering these variables as predictors of suc- 
cess of life insurance agents. However, 5 of 
20 of these measures proved to be good dis- 
criminators of successful-unsuccessful agents 
when combined in a battery. One of the per- 
sonal history variables, age, showed that it 
could be used, individually, as a predictor of 
failure in life insurance selling for those ap- 
plicants whose ages are below and above cer- 
tain minimum and maximum levels. 

The data of the study also show that tem- 
perament characteristics, as measured by the 
AVA, and the discriminating personal history 
variates are uncorrelated, thereby making it 
possible to establish independent screens for 
selecting life insurance salesmen. They fur- 
ther point to the predictive efficiency of these 
personality and personal measures in deter- 
mining success or failure among the agents of 
the study, and suggest criteria to be evalu- 
ated when considering the employment of ap- 
plicants to this position. 

The following conclusions are held to be 
tenable from the data of this study: 


(a) The AVA is a valid predictor of suc- 
cess-failure among life insurance agents. 

(6) Certain personal history measures are 
valid predictors of success-failure 
among life insurance agents. 

(c) Combining AVA and personal history 
data enhances the predictive efficiency 


Peter F. Merenda and Walter V. Clarke 


of these measures in determining the 
success or failure of life insurance 


agents over a sustained period of time. 
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Much current research in human communi- 
cation relates to group process, information 
theory, content analysis, and the psychophysi- 
cal study of messages. Little has been pub- 
lished on the relations between defined prop- 
erties of persuasive mesages (the words) and 
the attitude change they influence. 

Hovland, Janis, and Kelley (1953) have 
reported that emotionality of the message it- 
self is a significant determinant of attitude 
change in message recipients, in the case of an 
unpleasant emotion, fear. Lasswell, in Smith, 
Lasswell, and Casey (1946) has discussed 
theme emphasis and position effects in mes- 
sages. Doob (1948) has applied learning 
theory concepts to propaganda and _ public 
opinion. 

This study tests one theoretical deduction 
from a general quantitative theory of the re- 
lationships between message properties and 
properties of the message recipient’s behavior. 
The hypothesis is that response potential is a 
function of message unity. 


Definitions 


Message unity, approximately, may be 
thought of as the total persuasive effect or 
“pull” of the message. Precisely character- 
ized, message unity is the sum of the message 
theme feeling-tones, each weighted by the 
theme’s emphasis in the message and the 
reciprocal of the theme’s similarity to the 
most emphasized theme. The definition of 
message unity is based on a rational judg- 


1A more complete account may be found in the 
original M.A. thesis of the first named author, on 
file at Western Reserve University, Cleveland, Ohio 
2C. R. Porter served as faculty advisor at West 


ern Reserve University, Cleveland, Ohio, for this 
study. The first named author wishes to acknowl 
edge his invaluable guidance and assistance. With 
out it, this study could not have been completed in 
its present, form. 
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ment of the authors, grounded partly in the 
work of the above quoted authors. 

Response potential is the tendency of the 
message recipient to perform the action to 
which the message would persuade. Response 
potential is the score on a questionnaire whose 
items are weighted in terms of the correlation 
between the mean response potential for a 
given message and the number of individuals 
induced to act as the message requests. 

A message as here defined is any product 
of language behavior. The message can be 
divided into rhetorical elements called themes. 
Each theme is a concept embodied in one or 
more sentences of the message. These three 
classes of theme properties may be identified: 

1. Theme feeling-tone is the rated intensity 
of the motive evoked by the theme. 

2. Theme emphasis is the rated degree of 
attention demanded by the theme. 

3. Theme similarity is the rated extent to 
which a given theme supports the most em- 
phasized theme in the message. Message 
unity is thus defined in terms of the three 
sets of theme properties. 


Method 


To test the research hypothesis, that response po- 
tential is a function of message unity, three messages 
with widely disparate levels of message unity were 
disseminated by direct mail advertising. The mes 
sages, which requested the reader to purchase vita 
mins, were written on the basis of the generalized 
motive intensities of a sample of 107 vitamin pur- 
chasers, who were asked to rank a series of motives 
for self-importance. Out of the three messages there 
were abstracted critical word groups, each embodying 
at least one theme. Thirteen themes were abstracted 
Four themes appeared in Message A, 10 in Message 
B, and 13 in Message C 

To determine message unity, four raters first evalu 
ated theme emphasis, theme similarity, and theme 
feeling-tone. Message themes were exemplified to 
each rater by means of phrases expressing each theme 
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Table 1 


Message Unity, Mean Response Potential, and Number 
of Vitamin Orders for Three Messages 


Number of 
Vitamin 
Orders 
per 1,000 
Messages 
Mailed 


Mean 
Response 
Potential" 


Message 
Message Unity 
16.20 
70.59 


90.16 


14.07 34 
16.35 47 
16.39 58 


*One response potential questionnaire was received with 
each order, hence number of questionnaires per 1000 messages 
is identical with number of orders 


Emphasis, similarity, and feeling-tone of themes were 
defined for the raters by examples. Raters were 
asked to assign to each theme in each message a 
number from 1-10, first for emphasis, then for simi 
larity, then for feeling-tone. Judgments were to be 
made in accordance with rating criteria suggested by 
Guilford (1954). 

To eliminate biases, the ratings were adjusted ac 
cording to methods which Guilford (1954) suggests, 
to correct for leniency and halo error, rater-theme in- 
teraction and rater-message interaction. After ad 
justment, the ratings of the four judges were aver 
aged. The reliability of unadjusted and adjusted 
ratings was assessed by the average rank-order in 
tercorrelation between raters. From the mean ad 
justed ratings, message unity was calculated in terms 
of the definition that message unity is the sum of 
theme feeling-tones, each weighted by theme empha- 
sis and reciprocal theme similarity 

The three messages were sent to 3,000 vitamin buy 
ers, divided into three equal groups. Names of pur 
chasers were procured from a commercial list of 
previous vitamin buyers. The 3,000 names were re 
ceived on 8.5” by 11” sheets of gummed labels with 
sixty names per sheet. Systemmatic sampling was 
performed by cutting each sheet of labels into three 
parts of twenty labels each. The parts were then 
shuffled like a deck of cards and the resulting pile of 
partial sheets was separated into three piles by deal 
ing from the top in serial order. Thereby the entire 
3,000 names were each assigned to one of three groups 

Response potential was construed as the score from 
a questionnaire accompanying the messages. It con- 
sisted of a series of forced-choice questions, presumed 
capable of detecting tendency to act (here, to buy 
vitamins). The alternatives of each question were 
weighted according to a method proposed by Guil 
ford (1954), in which empirical weights were as 
signed based on proportions of response and vali 
dated in terms of mean response potential for a 
message and the number of vitamin orders the m« 
sage produced. 


Results and Discussion 


Reliabilities, as measured by mean rater 
intercorrelations for adjusted themes, from 
which message unity was calculated, ranged 
from 0.50 to 0.87. Validating correlation be- 
tween mean response potential for a given 
message and the number of vitamin orders for 
the message was found to be 0.95. Basic data 
are given in Table 1. 

Product-moment correlation coefficient be- 
tween individual response potential and the 
three values of message unity was 0.57, which 
was Statistically greater than zero at a ¢ value 
significant beyond the 1% level of confidence. 
A linear relationship was established between 
individual response potential and message 
unity. The relationship is expressed by the 
equation: 


U, 0.036 U,, + 13.6 


where U, is individual response potential and 
U,, is message unity. By setting confidence 
intervals, it was discovered that the slope 
ranged between 0.017 and 0.055 and the in- 
tercept between 12.2 and 14.9. The slope 
was shown to be statistically different from 
zero by a ¢ value significant beyond the 1% 
level of confidence. An analysis of variance 


technique revealed statistically insignificant 


deviations from linear regression for indi- 
vidual response potential on message unity. 
(Deviations from linear regression < 0.10.) 

Statistically, response potential is a linear 
function of message unity, a fortiori there is 
some relationship between the two variables, 
and verification is lent both to the research 
hypothesis and the theory from which it is de- 
rivable. Broader substantiation is needed, of 
course. 


Received October 1, 1958. 
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EFFECT OF SUBLIMINAL CUES ON TEST RESULTS ' 
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Recent assertions that presenting visual 
stimuli at exposure speeds in excess of con 
scious perception to influence behavior has 
caused widespread interest and some concern 
(Brooks, 1957; Cousins, 1957). McConnel, 
Cutler, and McNeil (1958) have reviewed re- 
cently the work in subliminal perception and 
point to the lack of experimental evidence 
supporting this technique as a means of in- 
fluencing behavior. Klein (1955) and his co- 
workers found that different sexual and sym- 
bolic figures exposed subliminally provoked 
different impressions of consciously perceived 
pictures of people. Smith and Henriksson 
(1955) have demonstrated that subliminal 
stimuli affect measurably conscious percep- 
tion. ,.These carefully controlled experiments 
involve presenting stimuli to individuals sub- 
liminally. Presenting stimuli to groups in 
which individual differences are not controlled 
would be similar to the conditions experienced 
by advertisers. This research is concerned 
with the extent to which individuals might be 
influenced in educational achievement. Sub- 
liminal cues suggesting correct answers to test 
questions should increase test scores and 
conversely subliminal cues which misinform 
should affect test scores adversely. The need 
to do well on tests should lower the threshold 
of Ss. 


Method 
Sixty-two Ss were drawn from a section of ele 
mentary general psychology and were divided into 
two groups on the of and upon 
earned on the first major test. Two Ss, one in each 
group, failed to complete the experimental work and 
were dropped. Three filmstrips of 50 test items 
each were constructed from material in the text. All 
items were multiple-choice items with four alterna- 
tive choices. Slides were made representing the num 
bers of the alternative choices. Two standard slide 
projectors were employed. One was used to pre 
sent the test items on the filmstrip and the other was 
used to present tachistoscopically the correct o 
correct alternative choice of the test 


basis sex scores 


in 

item 

R 
Q 


1This research supported by the University 


search Fund, Utah State University. 
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A Graphex shutter made by Wollensak was adapted 
to one slide projector to present the subliminal cues 
The accuracy of the shutter was checked with a 
Hewlett-Packard Electronic Counter, Model 522B. 
With the shutter set at 1/200 sec. the actual exposure 
time was 11.84 + .08 msec. Each test item was ex- 
posed from 25 to 35 sec. depending on the length of 
items. The subliminal cues were superimposed upon 
the test item 3 times, at approximately 7-sec. inter- 
with the shutter 1/200 and 
shutter opening at F32 

Filmstrip No. 1, the major test, furnished 
one criterion for group placement. Blank slides were 
used in the second projector to keep light, noise, and 
other activity associated with the operation of pro- 
jectors constant At each test session the filmstrip 
was presented twice, answer sheets were collected 
Filmstrip No. 2, the second 
major test, was shown first with blank slides used 
in the second projector; and during the second show- 
ing, with the of the correct alterna 
tive choice were superimposed upon the test item as 
subliminal cues. This was used as a check to deter 
mine whether or not the greups differed significantly 
in ability to respond to the subliminal cues. Film- 
strip No. 3, the third major shown first 
with blank slides, and on the s:cond showing, Group 
1 had alternate right and wrong cues presented and 
Group 2 had only right cues 


vals, speed set at sec. 


first 


after each presentation 


slides number 


test, was 


presented 

Ss were told that a new method in test presenta- 
tion was being developed and that they were not to 
discuss any procedures or conditions until after the 
At the of each 
testing session Ss were asked to note on the back of 
their answer sheet whether or not anything unusual 
occurred during the test period. They asked 
to list any cues of a helpful nature or any distrac 
tion in connection with the test situation. Comments 
concerning the test or test procedure were encour- 
Ss called in for brief after 
the test sessions 


three tests were completed close 


were 


aged were interviews 


Results 


An examination of individual performances 
revealed that 16 Ss, 9 from Group 1 and 7 
from Group 2, were able to respond to the 
cues throughout the experiment. Eight Ss, 3 
from Group 1 and 5 from Group 2, were un- 
able to experience any of the cues. Sixty 
percent, 36 Ss, were unable to respond to the 
subliminal stimuli on Test 2, second showing 
but were able to see similar cues on Test 3, 


second showing. Since a mean score change 
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of 5 or more points was significant at the .01 
level an individual score change of 6 or more 
“points was considered sufficient to indicate 
influences other than a practice effect. Of 
the 16 Ss reporting an ability to see the cues 
all showed positive gains and 14 showed 
changes of 6 or more points. Of the 8 Ss re- 
porting no ability to see only 2 showed such 
changes. The 36 Ss who reported no ability 
to respond to cues on their first experience, 
5 Ss had score changes of 6 or more points. 
On their second experience all showed changes 
of 2 or more points and 26 Ss had changes of 
6 or more points. 

A comparison of the groups is shown in 
Table 1. Group 2 performed progressively 
less well on the first showing of the filmstrip. 
The reversal of differences on Test 3 between 
the first and second showing likely resulted 
from Ss responding to the wrong answers, 
thus lowering the mean score. 

Table 2 shows a comparison between the 
mean scores earned by the groups on the first 
and second showings of the filmstrip. Experi- 
mental variables were introduced on the sec- 
ond showing of Test 2 and Test 3. The in- 
fluence of the correct cue raised the mean 
scores significantly (p < .001), and when al- 
ternate right and wrong cues were used, 
Group 1, Test 3, no raise in test score oc- 
curred. Sixteen of.the 30 Ss had scores within 
2 points of the 25 correct cues presented. No 
evidence was found to indicate a sex differ- 
ence in experiencing the subliminal cue. 


Discussion 
The majority (60%) of Ss learned to re- 


spond to the stimulus after it had been pre- 


Table 1 
Comparison of the Mean Test Scores between Groups 
N = 30 


M M VW 
Group 1 = Group2 __ Diff. 


0.99 
0 0.30 


Ist 28.26 7.27 
1 


2nd 28.40 8 


Ist 27.10 


2nd 32.87 


23.80 
30.77 
26.40 
36.23 


3.30 =.10 
2.10 

4.07 05 
S.20 O1 


Ist 30.47 
2nd 30.90 


Heber C. Sharp 


Table 2 
\ Comparison of Means on First and Second 
Showings of the Test Films 


N = 30 in all cases 


Group 1 
28.26 
28.40 


27.10 
32.87 


30.47 
30.90 


7.60 
6.92 
Group 2 

rs ae eS 


10 6.79 


3.80 7.44 
77 10.69 


40 8.31 


2 


23 8.26 


sented one or more times. Bricher and 
Chapanis (1953) have found that when a 
stimulus slightly below the limen was pre- 
sented once, the likelihood of its being recog- 
nized on subsequent trials increased. These 
data show such an increase. 

The tests used in this study were a major 
factor in determining S’s grade for the course. 
It seems reasonable to assume that Ss would 
have a fairly high need to do well on the tests 
although no measure for need achievement 
was employed. McClelland and Lieberman 
(1949) have shown that Ss with a high need 
to achieve had lower thresholds for success 
words than Ss who scored low on need achieve- 
ment. 

Boswell’s (1958) data shows a significantly 
greater recognition of stimuli at threshold lev- 
els on the ascending function of the alpha 
cycle. This may offer an explanation for Ss 
who experienced a sudden awareness of the 
cues, after once recognizing the cue, the likeli- 
hood of recognition of other cues seemed to 
be increased. 

Monnier (1952) and Cobb and Morton 
(1952) have shown that as the stimulus in- 
tensity decreases, the retino-cortical transmis- 
sion time increases. The use of low intensity 
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stimuli for subliminal work would, therefore, 
require a longer transmission time than would 
the better illuminated material upon which it 
was superimposed. The difference in light in- 
tensity of the two stimulating sources—test 
items and the subliminal cue—would necessi- 
tate two transmission rates occurring some- 
what simultaneously over the visual path- 
ways. This could lead to confusion and may 
tend to make S respond to one or the other. 
The Ss who attended to the test item would 
likely not be aware of the cue and it would 
remain subliminal for them. On the other 
hand, the Ss who attended to the cues and 
not to the test items would show test scores 
following the pattern of the cues shown. 
This would explain the tendency of Ss in 
Group 1, Test 3, to respond incorrectly when 
the cue indicated a noncorrect answer. Ss 
may be aware vaguely of the test item and 
may feel that an occasional cue was wrong 
but would respond to the cues since it was 
the chosen method of attack. Introspective 
reports of Ss in Group 1 tended to support 
this explanation. 

Four possible factors have been suggested 
to explain Ss’ behavior in learning to respond 


to stimuli which were, for the large majority 


of Ss, subliminal when first presented. These 
factors were: (a) a lowering of the threshold 
as stimuli were repeated; (0) the influence 
of a need to do well seemed to increase a 
search for cues to help in responding to test 
items, hence, a lowering of threshold for those 
cues which aided in recording the answer; 
(c) the likelihood of a cue once presented on 
the optimal phase of the alpha cycle enhanced 
its subsequent recognition; and (d) the dif- 
ference of possible neural conduction rates 
forcing S to choose to respond either to the 
cue or to the test item. 


Summary 


Sixty Ss from a general psychology section 
were divided into two groups. Subject mat- 
ter tests were projected on the screen (items 
on filmstrips). During the control periods 
blank slides were used for tachistoscopic pres- 


entation. In the experimental sessions cor- 
rect answers, and in one session alternate 
right and wrong answers were presented sub- 
liminally (11.84 + .08 msec.). The “hidden” 
cue was superimposed upon the item at low 
illumination intensity three times during the 
exposure period of the test item. 

Significant (p < .001) positive changes in 
mean scores occurred when right answers were 
presented as the cue. 

When alternate right and wrong answers 
were presented, 16 of the 30 Ss in Group 1 
received scores within 2 points of the 25 cor- 
rect answers presented. Although 50% of Ss 
recognized that some answers were wrong, 
they tended to record these wrong answers. 

A majority (60%) of the Ss learned to 
perceive consciously the “hidden” stimulus. 

No sex differences were found. 


REFERENCES 


Boswe.1, R. S. An investigation of the phase of 
the alpha rhythm in relation to visual recognition 
Unpublished doctoral dissertation, Univer. of Utah, 
1958. 

Bricuer, P. D., & Cuapanis, A. Do incorrectly 
perceived tachistoscopic stimuli convey some in- 
formation? Psychol. Rev., 1953, 60, 181-188 

Brooks, J. The little ad that isn’t there. 
Rep., 1957, 23, No. 1. 

Coss, W., & Morton, H. B. The human retinogram 
in response to high intensity flashes. EEG Clin. 
Neurophysiol., 1952, 4, 547-556 

Cousins, N. Smudging the subconscious 
Rev., 1957, 40, No. 40 

KEIN, G. S., Spence, D. P., Hort, R. R., & Gourt 
WITCH, SUSANNAH. Preconscious influences upon 
conscious cognitive behavior. Amer. Psychologist, 
1955, 10, 387. (Abstract) 

McC ietianp, D. C., & Lieperman, A. M. The effect 
of need for achievement on recognition of need re- 
lated words. J. Pers., 1949, 18, 236-251. 

McConne Ll, J. V., Cutier, R. L., & McNen, E. B 
Subliminal stimulation Psychologist, 1958, 
13, 229-242 

Monnier, M. Retinal, cortical, and motor responses 
to photic stimulation in man: Retino-cortical time 
and optomotor integration time. J. Neurophysiol., 
1952, 15, 469-486. 

Smitu, G. J. W., & Henrtksson, M. The effect of 
an established percept of a perceptual process be- 

{cla psve hol., 1955, 11, 346-355 


Consumer 


Saturday 


Amer 


yond awareness 


(Received October 


7, 1958) 





Journal of Applied Psychology 
Vol. 43, No. 6, 1959 


SIMULATED PATTERNS ON THE EDWARDS 
PERSONAL PREFERENCE SCHEDULE ' 


CHARLES F. DICKEN * 


Counseling and Testing Center, Stanford University 


An important problem in the use of inven- 
tories of interest and personality is the sus- 
ceptibility of the scores to simulation. At 
least three forms of conscious attitude may 
result in score. which fail to represent accu- 
rately the characteristics of the individual: 
(a) deliberate “faking” with intent to deceive 
the test user, (6) response in terms of an ideal 
self concept rather than a candid self-ap- 
praisal, and (c) response in terms of an 
“honest” but inaccurate or uninsightful self- 
assessment. 

One line of approach to the simulation 
problem has been concern for item selection 
and item subtlety (Gough, 1954; Meehl, 
1945; Seeman, 1952; Wiener, 1948). An- 
other approach has been the development of 
validity scores for detecting or counterbalanc- 
ing bias introduced by test-taking attitude 
(Gough, 1952; Humm, Storment, & Iorns, 
1944; Meehl & Hathaway, 1946). 

Simulation has been investigated experi- 
mentally by asking Ss to assume a specified 
role in responding to test items. Reviews 
(Gough, 1950; Meehl & Hathaway, 1946) of 
the extensive literature on role playing of 
the “fake good” and “fake bad” dimensions 
identified by Meehl and Hathaway (1946) 
indicate that the validity score approach, 
when available, is reasonably efficient in de- 
tecting these forms of simulation. Role-play- 
ing studies of structured inventory scales or 
patterns relating to specific personality traits 
or interest attributes (Bordin, 1943; Gough, 
1947; Kelly, Miles, & Terman, 1935; Long- 
staff, 1948: Sundberg & Bachelis, 1956; 
Sweetland, 1948; Wesman, 1952) have con- 
sistently found substantial alterations in the 
scores of Ss instructed to simulate. Validity 
scores have ordinarily been unavailable in 
studies of this type, although there is some 


1 The author gratefully acknowledges the assistance 
of Ralph Granneberg in obtaining and testing the Ss 
and of John Black in the data analysis 

2 Now at the University of Chicago 


evidence that they can be effective (Gough, 
1947). 

The most recent line of attack on the prob- 
lem of the descriptive accuracy of structured 
inventory scores concerns what Jackson and 
Messick (1958) have termed “stylistic” de- 
terminants of item response. Tendencies to 
acquiesce and to respond in terms of the so- 
cial desirability of the item are two major 
instances of stylistic determinants. Jackson 
and Messick reviewed the experimental evi- 
dence and concluded, “. stylistic deter- 
minants . as distinct from specific con- 
tent, account for a large proportion of re- 
sponse variance on some personality scales, 
particularly the California F scale, the MMPT, 
and the California Psychological Inventory” 
(1958, p. 250). 

The Edwards Personal Preference Schedule 
(EPPS) (Edwards, 1957) was constructed to 
measure a set of personality variables drawn 
from Murray’s (1938) list of manifest needs. 
The unique feature of the Schedule is an at- 
tempt to control the social desirability (SD) 
factor by means of a forced-choice format in 
which paired items scored for different vari 
ables are equated for independently judged 
SD. Control of SD would presumably elimi- 
nate one means by which a test S can obtain 
scores which are not truly characteristic of 
him, that of responding in the socially desir- 
able direction. 

Recent evidence on the EPPS casts doubt 
on the success of the control of SD and on 
the resistance of the Schedule to simulation. 
Corah, Feldman, Cohen, Grune, and Ring- 
wall (1958) found that 20 of 30 item pairs 


3 The names of the EPPS variables are as follows 
Achievement (ach), Deference (def), Order (ord), 
Exhibition (exh), Autonomy (aut), Affiliation (aff), 
Intraception (int), Succorance (suc), Dominance 
(dom), Abasement (aba), Nurturance (nur), Changi 
(chg), Endurance (end), Heterosexuality (het), Ag- 
gression (agg). A consistency score (con) is also 
computed, based on the number of identical choices 
made in two sets of the same 15 items. 
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sampled from the EPPS differed significantly 
in intrapair SD when judged as pairs, and 
found a high correlation between these differ- 
ences and the probability of endorsement of 
the items. Borislow (1958) studied EPPS 
response changes in Ss who were first tested 
under standard instructions and then asked 
to role-play social desirability or personal de- 
sirability. Both role-playing groups differed 
significantly in number of item responses al- 
tered and in test-retest profile correlations 
from a control group retested under standard 
instructions. Neither the consistency score 
nor profile stability coefficients discriminated 
simulated profiles from controls. Borislow 
interpreted his findings as indicating suscepti- 
bility of the EPPS to faking, but his small 
Ns prevented a descriptive analysis of score 
changes for the two role-playing samples, and 
he rejected the hypothesis of a differential 
effect of the two role-playing conditions. 
The present study investigated the qualita 
tive properties of EPPS score changes under 
four different role-playing instructions. The 


hypotheses were: (a) Subjects motivated to 
simulate a personality trait are capable of 
inducing substantial changes in their EPPS 


scores. (0) Substantial score changes will 
occur under the role-playing of a “good im- 
pression,” in spite of the attempted control 
of the SD factor. (c) Subject groups that 
role-play different personality variables will 
obtain different simulated patterns. (d) The 
consistency score is not an effective index of 
simulation. 


Method 


The EPPS was administered with standard instruc 
tions to 75 students in five introductory psychology 
classes at the City College of San Francisco. The 
Ss for the experiment ranged in age from 18 to 30 
They were permitted to identify their 
code numbers to preserve anonymity 

The sample was then divided into four role-play 
ing groups: need order (ORD), 8 males, 9 females; 
need dominance (DOM), 8 males, 11 
change (CHG), 13 males, 7 females; and good im 
pression (GI), 8 males, 11 females. The first three 
roles were chosen to represent a variety of the EPPS 
variables and to correspond roughly to three vari 
ables under investigation in a parallel study of simu- 
lation of the California Psychological Inventory. The 
fourth role relates to Hypothesis b. 

Each group was retested separately with instruc 


records by 


females; need 


tions to simulate for the purpose of winning an im 
aginary but highly desirable college scholarship. Sub 
jects in each of the first three groups were told to 
suppose a hypothetical “scholarship committee” used 
the EPPS to select individuals with a particular kind 
of personality trait. The name of the need variable 
and a three- or four-sentence description based on 
Murray (1938) and reproduced below were 
to the group and printed on a_ blackboard 

throughout the session. 

Need for order. A person with a need for order 
wants to achieve organization, neatness, and pre 
cision. This kind of person aims for perfection in 
details, attempts to keep possessions and work in 
careful order, and is exact and precise in speech and 
manner. Persons with a need for order behave in 
an organized, restrained, and careful manner in what 
ever they do. 

Need for dominance. A person with a need for 
dominance wants to influence, persuade, or direct 
other people by suggestion, persuasion, or command 
This kind of person tries to get others to cooperate 
with him and to convince them of the rightness of 
his opinions 


read 
visibl 


Persons with a need for dominanc« 
desire to lead, influence, guide, govern or supervise 
other people. Note that a person with a need for 
dominance need not necessarily 
unpleasant in his conduct. 

Need for change 


be domineering or 


A person with a need for change 
seeks variety, newness, and adventure in personal ex 


Table 1 

Means, Standard Deviations, and Mean Differences of 
EPPS Scores under Standard (Std) and Simu 

lated Need Order (ORD 
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Table 2 
Means, Standard Deviations, and Mean Differences of 
EPPS Scores under Standard (Std) and Simu 
lated Need Dominance (DOM) 
Conditions 


(N = 19) 


DOM 
Scale y Vf SD D 


ach 5 7 10.7** 
def 53.8 &.8 : ie 
ord J2. 2.5 
exh 47. 11. 16.8** 
aut 51. : : 2.3 
aff i= 
int WY ag 
4.4** 
22.6" 
13.3** 
1.9 
15.8°° 


3.5” 


wn 


suc 
dom 
aba 
nur 47.3 
chg 52.5 
end 53.6 
het 49.8 _ 
agg 49.7 9.3** 


con 53.8 . 46.5 ia 


1 2 
ree NNO 


i 


* Differs from zero at .05 le‘ 


! 
** Differs from zero at .01 level. 


periences. This kind of person avoids regularity or 
repetition in habits of living, attempting instead to 
experiment and to do things differently. Persons 
with a need for change are flexible and adaptable 
and enjoy changing their methods, habits, and pref- 
erences. 

The GI group was told to respond so as to give 
the most favorable possible impression of themselves 
to the scholarship committee, without further specifi- 
cation of role. One week elapsed between the first 
and second test administrations for all groups 


Results 


Tables 1-4 show the means and standard 
deviations of the EPPS scores of the four role- 
playing groups for standard and simulation 
conditions. The mean difference scores (simu- 
lation condition minus standard condition) 
are also shown.* The means for the stand- 


4The raw scores were converted to T-score values 
appropriate to the sex of the S. Preliminary analy- 
sis of the four samples indicated no substantial sex 
differences in either standard or simulation condi- 
tions. There were no male-female reversals of the 
direction of mean change scores where both change 
score means differed significantly from zero. Data 
for male and female Ss were combined for the main 
analyses. 


ard condition are comparable for the four 
samples and are, in the main, reasonably close 
to those of Edwards’ normative sample. 

The ¢ test for correlated measures was used 
to compare the mean differences with a null 
hypotheses of zero difference. The effect of 
the forced-choice format of the EPPS on 
score changes in simulation should be noted 
in interpreting the outcomes of the signifi- 
cance tests for individual scales. An altered 
item response which increases an S’s score on 
one variable also decreases his score on some 
other variable. Thus while the increases in 
any sample are independent of each other, 
and while the set of decreases is similarly in- 
ternally independent, the increases and the 
decreases are not independent. A conserva- 
tive interpretation would consider the signifi- 
cance of the changes for a single direction 
only (increases being probably of greater in- 
terest here), and would treat the remaining 
changes (e.g., decreases) in terms of relative 
magnitude only. 

The effect of the role instructions on the 


Table 3 


Means, Standard Deviations, and Mean Differences of 
EPPS Scores under Standard (Std) and Simu 
lated Need Change (CHG) Conditions 


(NV = 20) 


Scale S d S D 


47.2 12.9 1.9 


ach 10.3 # 
def ; 7.6 
ord Ae 10.7 43.3 99 
exh : 9.1 57.7 12.8 10.0** 
aut 8 9.1 60.4 10.1 13.6** 
aff 9.1 43.0 7.6 —3.6 
7.9 46.1 6.6 6 
10.4 48.6 7- — 8 
11.1 o1./ 6 = S. 6.9** 
8.2 428 8. 10.4** 
10.5 43.0 7 
10.1 65.1 eR a 
12.9 45.0 -6.6* 
12.6 50.0 

9.5 59.5 

10.0 43.3 


47.0 84.4 


—5.2* 


-10.4** 


~ 


int 
SUC 
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nut! 


non 
mm Ww 
9 3 0 =! \ CO 
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end 
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con 


wor 


* Differs from zero at .05 level. 
** Differs from zero at .01 level. 
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Table 4 


Means, Standard Deviations, and Mean Differences of 
EPPS Scores under Standard (Std) and Simu 
lated Good Impression (GI) Conditions 


(N 


= 19) 


D 


11.0” 
20.8** 
WE a 
—7.7°° 
- 12.0** 
-4.1° 
2.5 
-8.0** 
1. 
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end 
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con 47 < 


* Differs from 
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similarity of individual EPPS profiles within 


each sample is shown in Table 5. Score pat- 
terns of individuals show little concordance 
in the standard conditions, but have a highly 
significant level of concordance in every simu- 
lation condition. This indicates a shift from 
an “individual” pattern of responses to a 
“role-characteristic” pattern when the S simu- 
lates. Borislow’s (1958) concordance values 
for his simulated social desirability (SD) and 
personal desirability (PD) groups are in- 
cluded in the table for comparison. The pres- 
ent good impression group appears to have 
simulated in a homogeneous 
than the earlier SD group. 

The large and statistically reliable mean 
changes in all samples and the consistent con- 
cordance shifts confirm Hypotheses a and Db. 
The differences in the mean simulated pat- 
terns and the between-condition correlations 
of mean changes (Table 6) confirm Hypothe- 
sis c with one exception. The three trait- 
simulation conditions induced mean changes 
in the 15 variables which are either essentially 


more fashion 
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uncorrelated or negatively correlated. In 
each case the pattern of prominently elevated 
scores is different, and the peak score is on 
the relevant variable. However, conditions 
ORD and GI yielded changes which are 
highly correlated, and mean simulated pro- 
files which are for practical purposes indis- 
tinguishable. 

Edwards found relatively low intercorrela- 
tions of the EPPS variables in the normative 
sample. However, the simulation instructions 
in the present experiment induced significant 
changes in scales other than the “primary” 
scale for which the instructions were written. 
One hypothesis which might account for the 
changes in the “nonprimary” scales is that 
these changes relate to the size of the correla- 
tions of the nonprimary scales with the pri- 
mary scale, even though the correlations are 
of a generally low order. The rank difference 
correlations between the amount of change in 
nonprimary scales and the magnitude of the 
normative sample correlations of these scales 
with the primary scale are positive and sig- 
nificant in conditions ORD (rho = .85) and 
DOM (rho = .55), but there is no associa- 
tion in condition CHG (rho= .03). There 
is no immediate explanation for the failure 
of the hypothesis in the CHG condition, al- 
though it may be noted that the score changes 
and the concordance shift are least in this 
condition. 

Hypothesis d is confirmed by the data from 
all four conditions. Although the mean con 
score decreased in all conditions, the decreases 


Table 5 


Kendall Coefficients W) of EPPS 
Profiles in Standard (Std) and Simulation (S 
Conditions for Four Role-Playing Samples 
and for Borislow’s SD and PD Samples 


f Concordance 


Sample Std 


ORD 
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CHG 
GI 
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Borislow 
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* W Signific 
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Table 6 


Between-Condition Spearman Correlations of Ranks 
Mean Change Scores for Fifteen Need Variables 


Condition 


DOM 02 
CHG .67* 
GI .90* 

ORD 


0 A8 
DON CHG 


* Rho differs from zero at .01 level (no values significant at 
.OS level). 


are not significant in two groups, and the 
overlap of con scores for simulation and 
standard conditions is large in all groups. 
Even if a liberal cutting score of 10 or less 
raw score points on con is used as a simula- 
tion index (which would identify as “simu- 
lated” 15% of the records in the normative 
sample), only the following relatively small 
proportions of the simulated records would 
be detected: ORD, 5/17; DOM, 6/19; CHG, 
8/20; and GI, 5/17. Unsimulated records 
from these samples misidentified by this cut- 
ting score would be seven, two, seven, and 
four cases respectively. 


Discussion 


One of the most important findings is the 
failure of the social desirability pairings of 
items to control the distorting effect of test- 
taking attitudes. The changes in the GI 
condition tend to support the conclusion of 
Corah et al. (1958) that SD is not equated 
in some of the pairs. Even if the item format 
partly controls social desirability bias, which 
seems likely, the role-playing data suggest 
that distortion of the EPPS by simulation of 
characteristics other than SD remains a dis- 
tinct possibility. 

Since the Ss in the three trait-simulation 
conditions were given descriptions of the vari- 
ables they were to simulate, the question 
arises whether these experimental distortion 
sets are meaningfully related to conscious or 
unconscious role taking in a normal testing 
situation. The reader may verify the degree 
of similarity of the role instructions to the 
content of the EPPS items by reference to the 
scoring keys. Some correspondence was un- 


avoidable because of the “obvious” character 
of the items. In general, however, the role 
instructions make a broader and more ab- 
stract reference to the need variables than do 
the items, suggesting that to some degree true 
role taking rather than information on spe- 
cific item content determined the changes. 
The success of the essentially uninstructed 
“scholarship applicants” in Condition GI in 
simulating traits (order, achievement, endur- 
ance, and deference) which are both relevant 
and “desirab, sith respect to the goal they 
sought argues rather cogently against assum- 
ing that simulation could not occur except in 
Ss with specific information about the instru- 
ment. 

The evidence for susceptibility to distortion 
gives cause for question of the feasibility of 
constructing an instrument for variables of 
this type without a systematic procedure for 
determining item and scale validities. The 
Manual makes no reference to item selection 
other than for social desirability values. The 
nature and arrangement of the items suggests 
that the questions of subtlety and of com- 
prehensiveness of content were similarly neg- 
lected. Scores for each scale are determined 
by endorsement of a very small number of 
statements (nine), because statements are re- 
peated in identical form in the sets of 28- 
item pairs scored for each scale. The state- 
ments for each variable appear highly “face 
valid” and are strikingly similar in content. 
The effect of face validity and content homo- 
geneity in facilitating selective endorsement 
of a particular kind of item is probably aug- 
mented by the arrangement of the test book- 
let. More than half the statements scored for 
each scale appear in “runs” of five consecu- 
tive item pairs. 

The effect of conscious distortion, the limi- 
tations in content and subtlety of the items, 
and the inadequacy of validity data® sug- 


5 The Manual contains no validity data other than 
low correlations of the scales with other personality 
scales and mention of some inconsistent correlations 


with self-ratings. Subsequent studies of concurrent 
validity of the EPPS have given partly positive 
(Zuckerman, 1958), and partly negative (Dilworth, 
1958; Himmelstein, Eschenbach, & Carp, 1958) find- 
ings. Construct validity studies have given mixed 
positive and negative findings (Bernardin & Jessor, 
1957; Gisvold, 1958; Zuckerman & Grosz, 1958). 
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gest there is relatively little basis at present 
for regarding EPPS scores as measures with 
properties other than those of a self-report. 
If this conclusion is correct, interpretation of 
a score as a measure of an examinee’s actual 
characteristics rests to the assumption that he 
is both (a) able to perceive his own charac- 
teristics accurately and (0) willing to report 
these perceptions candidly. Meehi’s rationale 
for empirically constructed scales, “the scor- 
ing does not assume a valid self-rating to have 
been given” (1945, p. 299), cannot be used. 
In selection problems, (a) is usually unknown 
and (0) often false. In counseling, (b) may 
often be assumed, but if (a) is correct the 
need for a personality inventory may be viti- 
ated. The usefulness of earlier personality 
inventories dependent on validity of self-re- 
port has been disappointing (Ellis, 1946). 

A final and important practical implication 
of the findings is that the lack of effective va- 
lidity indices for detecting distorting attitudes 
is one of the most crucial weaknesses of the 
EPPS in its present form. 


Summary 


The simulability of the Edwards Personal 


Preference Schedule was studied in four role- 
playing experiments. Large and _ reliable 
changes were found in each of three scales 
when Ss were instructed to present them- 
selves as possessing the trait which the scale 
was designed to measure. Changes were also 
found in scores for traits other than the one 
simulated. Simulation of a “good impression” 
yielded substantial and reliable score changes. 
The patterns of mean scores obtained in the 
different role-playing conditions were differ- 
ent. The consistency score did not discrimi- 
nate the simulated records. The results were 
discussed with reference to the failure of the 
attempted control of the social desirability 
factor in eliminating the effect of test-taking 
attitudes, the problem of subtlety, and the va- 
lidity and practical usefulness of the instru- 
ment. 
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THE EFFECTS OF PARTIAL PAIRING ON SCALE 
VALUES DERIVED FROM THE METHOD 
OF PAIRED COMPARISONS 


W. W. RAMBO 


Oklahoma State University 


One frequently mentioned criticism of the 
method of paired comparisons is the great 
number of observations required to scale a 
set of stimuli. In an attempt to reduce the 
experimental and computational labor asso- 
ciated with a complete pairing of stimuli, 
there have been four general techniques ad- 
vanced which permit the determination of 
scale values from a partial pairing of stimulus 
objects. With some minor modification the 
computation of scale values is the same for 
these procedures as is that required by a com- 
plete presentation of stimulus pairs, and the 
assumption is made that the obtained values 
are similar to those derived from the com- 
plete pairing. 

One technique requires an initial ordering 
of stimuli by means of some less laborious 
procedure, such as ranking, or the method of 
equal-appearing intervals. A relatively few 


stimuli are selected which occupy positions 
covering the entire length of the ordered se- 
ries, and then all stimuli are compared with 


these “representative” standards. A deriva- 
tive of this procedure requires judges to com- 
pare only those stimuli which lie fairly close 
to one another on the preliminary scale. It 
is assumed that comparisons made between 
stimuli that are widely separated in the se- 
ries would reflect perfect discriminability, and 
therefore they would contribute little to the 
determination of scale values. 

A third procedure. requires that the com- 
plete pairing matrix be divided into a number 
of submatrices each containing a number of 
common stimuli. Complete pairings are made 
within each submatrix, and the final scale 
values are obtained through adjustments made 
using the several scale values assigned to the 
stimuli held in common by each matrix. 

Finally, a fourth procedure consists of se- 
lecting a random number of pairs from the 
judgment matrix in order to arrive at an un- 
biased estimate of the scale values. This 


procedure requires less effort since it elimi- 
nates the necessity for preliminary ordering 
and adjustment of submatrix scale values. 

For all of the above procedures there is 
little available in the way of empirical evi- 
dence which demonstrates the effects of the 
reduction in the number of pairs on the scale 
values derived from the method of paired 
comparisons. One study reported by Mc- 
Cormick and Bachus (1952) indicates that 
the number of pairs can be drastically re- 
duced without seriously influencing the scale 
values obtained. These authors present a 
method of selecting random pairs which con- 
sist of randomly assigning stimuli to positions 
along the borders of the judgment matrix and 
then selecting sets of matrix diagonals so that 
all stimuli enter into the same number of 
comparisons. 

The procedure followed in their investiga- 
tion consisted of instructing judges to rate a 
large number of stimuli which were presented 
in a complete pairing series. Scale values 
were computed from these judgments. Next, 
the number of pairs was systematically re- 
duced by drawing successive subsets of pairs 
from the original matrix, and then scale values 
were recomputed. For one group of 30 stimuli, 
four partial pairing groups were used which. 
employed 24, 15, 12, and 8 pairs per stimulus. 
Correlations between the scale values obtained 
from these partial pairing schedules and those 
obtained from the complete matrix ranged 
from .996 down to .898. 

One factor completely confounded by the 
procedure followed in the study just cited is 
the rating task presented to the judges. The 
judgments used in the four partial-pairing se- 
ries were extracted from the complete pairing 
matrix, hence all of the scale values obtained 
were derived from sets of judgments that were 
made within the context of the complete pair- 
ing situation. Therefore, fatigue, memory, 
and set factors which doubtlessly vary with 
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Table 1 
Complete and Partial Pairing Tasks Assigned 


to the Six Experimental Groups 


Total V 
of Pairs 


Pairs per 


Group Stimulus 


29 435 
25 375 
20 300 
15 225 
10 150 


5 75 


the reduction in the number of pairs were 
identical in all partial pairing matrices. 

The purpose of this investigation is to de- 
termine the relationship between scale values 
computed from complete and partial pairings 
when the rating task as well as the number 
of observations is permitted to vary with 
the reduction in the number of pairs. Sepa- 
rate groups will be used for each of the partial 
pairing schedules, therefore permitting scale 
values to be determined. within the context of 
the rating task required by each of these con- 
ditions. 


Method 


The stimuli considered in this study were 30 na 
tionality group names which have been extracted 
from the Borgardus Scale of Social Distance (1947) 
The stimulus pairs were printed on decks of IBM 
cards; the order of presentation and location of each 
stimulus was determined by tables presented by Ross 
(1934). This series yields the maximal separation of 
stimulus pairs that have one member in common, 
and it counterbalances possible position preferences. 
Ten decks of cards were prepared for each of six 
experimental groups which reflected a successive re- 
duction in the number of pairs used to compute the 
scale values for these nationality groups. Pairing 
schedules were determined following the McCormick 
and Bachus procedure. Table 1 shows the number 
of cards per deck and the number of observations 
per stimulus which went into the definition of the 
six rating conditions employed in this study. It will 
be noticed that the smallest deck (Condition E) 
represents approximately an 82% reduction in the 
number of pairs required by the complete pairing 
schedule (Condition Z) 

The Ss used in this study were 36 males and 24 
female white undergraduate students who were en- 
rolled at Oklahoma State University in an elemen 
tary psychology course. These Ss were randomly as 
signed to one of six experimental groups, and each 
of the six groups was then segregated so that none 
of the Ss in one experimental group was aware of 


the difference in rating tasks assigned to the other 
groups. 

Identical instructions were read to the six experi- 
mental groups. Each S was asked to go through the 
deck of cards which was placed in front of him and 
judge, for each nationality group pair, which mem- 
ber of the pair was most preferred by the “average 
American.” Ss indicated their judgments by draw 
ing a line under each nationality group name se- 
lected. The instructions also indicated that the Ss 
would be asked to fill out a short questionnaire after 
they had finished the rating task. This questionnaire 
was the E Scale (Adorno, Frenkel-Brunswik, Levin 
son, & Sanford, 1950) which purportedly gives an 
estimate of the extent of ethnocentric attitudes held 
by the S. The scale was administered in order to 
gain information which would reflect on the ade 
quacy of the randomization procedures used in as 
signing Ss to experimental conditions. A single clas- 
sification analysis of variance was run to deterfnine 
whether or not there were significant ethnocentric 
attitude score differences existing between the groups 
The F value obtained (1.29; 5, 
nificant at the .05 level. 


54 df) was not sig 


Results 


For each experimental group scale values 
were determined for each of the 30 stimuli. 
This analysis was carried out under the as- 
sumptions required by Case III of the law of 
comparative judgment., Hence, using formulae 
presented by Burros (1951), estimates for 


stimulus discriminal dispersions were com- 


puted. Pearson correlation coefficients were 
computed in order to estimate the degree of 
association existing between the scale values 
determined by the complete pairing and those 
obtained from each of the five partial pair- 
ings. Table 2 presents the results of this 
analysis. Here it can be seen that these co- 
efficients are quite high for the first three 
partial pairing groups, but they drop off for 
the two smaller pairing schedules. 


Table 2 
Correlation Coefficients Estimating Relationship 
between Scale Values Obtained from Com- 
plete and Partial Pairing Schedules 


Group 
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These coefficients were transformed into 
Fisher Z values and one-tailed ¢ tests were 
run to determine whether there was a signifi- 
cant reduction in the size of r as the number 
of pairs are reduced. Since one variable 
(scale values from Group Z) is held in com- 
mon by all correlation coefficients this test is 
biased on the conservative side. 

The results of this analysis indicated that 
there were no significant differences between 
the coefficients obtained from Groups A, B, 
and C. However, a significant difference was 
obtained between the coefficients computed 
from Pairing Schedules Cand D. This dif- 
ference was significant at the .05 level. The 
difference between Schedules D and E was 
not significant at the chosen significance level. 

Discussion 

The magnitude of the coefficients obtained 
from relating the partial pairing scale values 
with those obtained from the complete pair- 
ing schedule indicates that the method of 
paired comparisons can withstand a substan- 
tial reduction in the number of pairs without 
seriously altering the scale values obtained. 
It will be recalled that scale values derived 
from a partial pairing schedule which yielded 
approximately a 50% reduction in the num- 
ber of pairs still correlated .93 with the scale 
values obtained from a complete pairing 
schedule. This coefficient, while high, is sig- 
nificantly lower than those reported by Mc- 
Cormick and Bachus for a similar reduction 
in the number of pairs (.99 and .98). Of 
course, the willingness of an investigator to 
accept alterations in scale values depends 
upon the accuracy required by the purpo: 
of his research. However, in many applied 
situations the time and labor savings associ- 
ated with a 50% reduction in the number of 
observations would far outweigh this distor- 
tion of scale values. Beyond this point, how- 
ever, the results of this study indicate that 
further reduction yields significant modifica- 
tions in scale value estimates. 

It is the contention of this paper that par- 
tial pairing is more than a statistical consid- 
eration. Although the reduction in the num- 
ber of observations obviously influences the 
stability of the scale values, it also results in 
a modification of the over-all rating task re- 


quired of the judges. For instance, the initial 
task set is conceivably modified when raters 
are presented with a smaller number of pairs. 
Doubtlessly, the influence of certain fatigue or 
boredom factors is modified by reducing the 
number of pairs, and it is quite probable that 
memory for past judgments plays a more sig- 
nificant role in partial pairing schedules since 
fewer judgments intervene between the succes- 
sive appearance of a given stimulus. There- 
fore, the investigator should consider the ex- 
perimental implications of these modified task 


‘variables before he decides on a substantial 


reduction in the number of pairs he employs 
in his paired comparisons scaling. 


Summary 


The study just reported represents an at- 
tempt to describe the relationship between 
scale values derived from partial and com- 
plete paired comparisons judgments. One 
complete and five partial pairing schedules 
were used, and 60 Ss, who had been randomly 
assigned to one of these six conditions, scaled 
the names of 30 nationality groups. 

The results of the study indicated that par- 
tial pairing scale values rather closely approxi- 


mated those obtained from a complete pairing 
when the number of observations was reduced 


as much as 50°% of the complete pairing 
matrix. Beyond this point further reduction 
seemed to yield a more drastic modification 
in the scale values obtained. The rating task 
implications of partial pairing were discussed. 


Received January 26, 1959. 
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Increasing interest is being directed to the 
possibility that subliminal presentation may 
be useful, or effective, in such activities as 
advertising, attitude modification, or perst:a- 
sion. Several recent evaluations (McConnell, 
Cutler, & McNeil, 1958; Naylor & Lawshe, 
1958) of available scientific evidence suggest 
that further experimentation is necessary to 
demonstrate the validity of claims that be- 
havior can be influenced by subliminal stimu- 
lation. 

The following experiment was designed and 
conducted to determine if a visual stimulus 
presented subliminally to a group of Ss would 
influence recognition or association responses. 
These responses were obtained immediately 
after the presentation of the stimulus. If re- 
sponses were found to be related to the stimu- 


lus presentation, then “subliminal perception” 
could operationally be said to have been dem- 


onstrated. By defining subliminal perception 
in this manner, interpretations would not be 
extrapolated beyond the data collected. 

If the phenomenon of subliminal perception 
actually occurs, its effects should appear in 
the rélatively simple responses of recognition 
and association. Investigations of the effect 
of subliminal perception on more complex be- 
havior, such as learning, persuasion, and atti- 
tude change, would be more legitimate after 
demonstrating its effect on less complex re- 
sponses. 


Procedure 


Apparatus. A 16 mm. film projector was used to 
project a 30-min. film based on a sales administra- 
tion textbook entitled The Bettger Story. A slide 
projector capable of projecting slides of 3” X 5” was 
used to flash slide figures on the screen during the 
showing of the film by means of an attached lens 
with a shutter. The shutter permitted discrete ad- 
justments for speed and continuous adjustments for 
aperture opening. Two slides were prepared, one to 
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serve as an experimental stimulus and the other as 
a control stimulus. Slide A, the experimental stimu 
lus, was an original drawing of a spoon of rice with 
the words “Wonder Rice” printed below on a black 
background. Slide B, the control stimulus, merely 
consisted of four lines placed in a nonsensical man 
ner on a black background. Slide A was prepared 
to be presented to the experimental group and Slide B 
to the control group. Slide B was intended to be 
nonsensical in nature and used purely for the purpose 
of reproducing conditions experienced in the experi- 
mental group as nearly as possible. with the excep- 
tion of the actual meaningful stimulus 

Subjects. Two groups of Ss enrolled in a 
and advertising class at Purdue University were used 
as a control group and an experimental group. The 
administration of the experiment took place during 
two consecutive class periods on the same day. 

Method. For the purposes of this study sublimi 
nal perception was defined as the presentation of a 
stimulus visible under constant exposure at such a 
speed to bring it below the threshold of con 
scious awareness. There were three variables to be 
coordinated in the presentation of the slide stimulus 
in order to achieve subliminal perception. The vari- 
ables were exposure time, aperture opening, and slide 
construction. 

A lens capable of shutter speeds of .01 sec. was 
attached to the turret of the slide projector. The 
slide projector lens was set at a speed of .01 sec 
and the aperture opening reduced gradually until the 
slide was no longer visible when flashed on the screen 
while the film was being shown. In order to prevent 
the film from masking the stimulus (figures from the 
slides) it was necessary to ensure that the stimulus 
was visible under constant exposure when superim- 
posed on the film being presented. It was found 
that this requirement could be achieved when slides 
with white background and black figures were super 
imposed on the film by adjusting exposure time and 
aperture opening. However, a white flicker on the 
film was detectable when this type of slide was pre- 
sented at the relatively slow speed of .01 sec. The 
flicker was eliminated by redesigning the: slides so 
that the background was black and the figures were 
white. Thus it was assured that a stimulus projected 
at the set aperture opening was being reflected from 
the screen with subliminal presentation resulting as 
a function of shutter speed. 

The slides presented to their appropriate groups 
were projected for a duration of .01 sec. at 10-sec. 
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Table 1 


Tabulations of Questionnaire Responses 


Monarch Wonder Total 


Control group responses 
Yes 
No 
Total 
Experimental group 
responses 
Yes 
No 
Total 


intervals during the 30-min. film. At the conclusion 
of the film a questionnaire was given the Ss for the 
purpose of determining the effect of the subliminal 
presentation. The questionnaire consisted of a repro- 
duction of the illustration used in Slide A, that of 
the spoon of rice, but this time the words “Wonder 
Rice” were omitted. The following was also in- 
cluded on the questionnaire: 


At one time you may have seen the above adver 
tisement illustration used to promote the sale of 
rice. Check below according to your recognition 
of this advertisement. 


__. YES 
_No 


Regardless of whether you recognize the illustra- 
tion check the brand from the following list when 
vou believe to be most likely associated with the 
illustration 

Monarcu 


_ WONDER 


The first questionnaire response was designed to 
indicate whether the Ss recognized the stimulus fig- 
ure. The second response was designed to indicate 
whether the Ss associated the hypothetical figure 
with the brand name. The brand names used were 
selected on the basis of brands believed to be some- 
what common to the Ss but not predominant. 

The Ss had no knowledge that an experiment was 
being conducted. They were only told that a film 
related to their course in sales and advertising was 
being presented. In an attempt not to arouse sus 
picions regarding the questionnaire, the Ss were told 
that the questionnaire was being presented as a corol- 
lary to other requirements of their course work. 


Results 


Answers to the questionnaire items were 
tabulated for the control and experimental 
groups, These tabulations are shown in 
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Table 1. The chi-square technique was used 
to evaluate the differences between the ex- 
perimental and control groups according to 
their recognition of the stimulus figure. 
Table 2 shows the results of the test of the 
hypothesis that the experimental group was 
drawn from a population having proportion- 
ately the same frequency breakdown as the 
control group. The resulting chi-square value 
was not significant. 

The chi-square technique was again used to 
compare the experimental and control groups 
according to their association of brand names 
with the stimulus figures and the results are 
given in Table 3. Not only were the answers 
to the items not significantly different for the 
two groups, but responses for both items were 
in the “wrong” direction. For example, in 
the experimental group significantly more peo- 
ple (at the .05 level) checked the incorrect 
brand name than would have been expected 
to do so by chance. This indicated that a 
pre-experimental bias may have existed and 
illustrates the necessity for using a control 
group rather than merely testing for signifi- 
cant deviations from chance expectancies. 

The chi-square values reported in Tables 2 
and 3 were not corrected for lack of continuity 


in the discrete frequencies as is usually done 


for chi-square tests with only one df. A cor- 
rection for continuity would merely reduce 
the chi-square values which are already too 
small to be significant. 


Conclusions 


The results of this study indicate that sub- 
liminal presentation had no effect on the re- 
sponses of the Ss in recognizing the stimulus 
figure or of associating the brand name with 
the stimulus figure. It was felt that sublimi- 


Table 2 


Control Group vs. Experimental Group on 
Recognition of Stimulus Figure 


Response 


Yes Total 
Observed (experimental) 19 
Expected (control) 19 


x? (uncorrected 
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Table 3 
Control Group vs. Experimental Group on Association 
of Brand with Stimulus Figure 


Response 
Monarch Wonder Total 


Observed (experimenta]) 14 5 19 
Expected (control) 15.2 3.8 19 


x*® (uncorrected) = .474; p > .30 


nal perception should be demonstrated at this 
level of response before investigating the ef- 
fects of subliminal presentation on more com- 
plex behavior, such as buying specific prod- 
ucts or influencing public opinion. 

Although responses to the questionnaire 
deviated significantly from a chance distribu- 
tion, the responses were in the opposite direc- 
tion from what would be expected if sublimi- 
nal perception existed, and the experimental 
group responses did not differ significantly 
from the control group responses. 

There may be subliminal “thresholds” of 
perception, similar to the conventional thresh- 
old or limen for the awareness of sensations. 


and Weld W. Turner 


Failures to produce evidence of subliminal 
perceptions could be attributed to stimulus 
presentations at levels below the subliminal 
“threshold.” Limitations in the flexibility of 
available equipment prevented the present au- 
thors from searching for thresholds of sub- 
liminal impressions. 

If subliminal perception occurred, it did not 
affect questionnaire responses. Subliminal per- 
ception could conceivably be demonstrated 
with a method similar to that used here by 
presenting the stimulus more frequently or 
by simplifying the design of the stimulus. 
However, in view of the results of this study, 
it appears that the burden of proof is placed 
on those who insist that subliminal perception 
is capable of influencing behavior. 


Received February 9, 1959. 
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Use of the lie detector depends on the as- 
sumption that there is a distinctive pattern 
of physiological response which accompanies 
lying and which can be distinguished from 
that which accompanies truth telling. Most 
modern lie detector operators expect lying to 
produce a greater amplitude of physiological 
response, although others have asserted that 
certain qualitative differences are character- 
istic (e.g., Marston, 1938, p. 52; Summers, 
1939). Claims of high validities for these 
methods do not find support in properly con- 
ducted empirical study. The most extensive 
research thus far reported (Ellson, Davis, 
Saltzman, & Burke, 1952), which employed 
a total of 13 response variables and careful 
multivariate statistical analysis, achieved only 
73% correct classification, against a chance 
expectancy of 25%. 

Use of physiological measurements to de- 
tect not lying, but the presence of “guilty 
knowledge,” requires only the more reason- 
able assumption that a guilty person will 
show some involuntary physiological response 
(e.g., GSR) to stimuli related to remembered 
details of his crime. If the crime is such that 
the investigator can discover a number of 
factual details with which only the guilty 
person should be familiar, then the guilty 
knowledge method can be used. The guilty 
knowledge items are interspersed with other 
similar but irrelevant items in a stimulus list. 
The S is told that E is going to mention a 
number of items and that, if he is guilty, he 
will recognize some of these as being related 
to the crime in question. The items may be 
stated in question form, in which case the S 
may or may not be required to answer. 

A guilty S, knowing which items are rele- 
vant and which are not, would be expected to 
respond differently to the relevant than to the 
irrelevant items. Usually, he would be ex- 
pected to give larger responses to the relevant 
items, although it should be pointed out that 


1 Richard Rose, George Skaff, and Joe Ylitalo con- 
ducted this experiment. 


any consistent difference in the responses to 
the two classes of stimuli is evidence of guilt. 
Thus, an S who manages by self-stimulation 
to produce large GSRs to the irrelevant items 
is betrayed by the fact that his responses to 
the relevant items are consistently smaller. 


Method 


Ss used in this experiment were 49 male college 
students who were assigned at random to four 
groups. Those in Group 1 (13 Ss) were required to 
enact two mock crimes in sequence, a “murder” and 
a “theft.” For the Murder enactment, S was taken 
to the second floor of the building and required to 
knock on the door of one of the offices. The door 
was opened by an assistant who, after some prelimi- 
nary conversation, invited § to play a hand of poker, 
which was thereupon dealt out, the assistant getting 
the better hand. Remarking that S now owed him 
a hundred dollars, the assistant then walked over to 
stand looking out the window. Taking a weapon 
from his pocket, S went through the motions of 
killing the assistant, hid the weapon in a drawer of 
the desk, and left the office 

In the Theft enactment, S had to idle near the 
doorway of a different office until the occupant, a 
woman, left it to go into the washroom. S then 
hurriedly entered and riffled through the desk calen- 
dar until he found a page on which his own name 
had been entered. He erased the name and then 
searched through the desk until he found the article 
(e.g., a watch) which he had been instructed to 
“steal.” Leaving the office, he hid the stolen prop- 
erty in a locker in the hallway. 

As already mentioned, Ss in Group 1 enacted both 
of these mock crimes, in random sequence. Those 
in Group 2 enacted only the Murder, those in Group 
3 only the Theft, and those in Group 4 were exposed 
to neither of the crimes. The next step was for S 
to be turned over to another E for interrogation. E 
was not informed to which group S belonged. S 
was seated in the interrogation room, GSR ele 
trodes attached to his dominant hand, shocking elec 
trodes to his other hand, a blindfold put over his 
eyes and a pair of headphones adjusted to his ears 
E was located with the apparatus in an adjoining 
room and spoke to S§ via a microphone. 

Each S was told that he was to be questioned in 
relation to two crimes. He was instructed to listen 
to each question but not to reply to any of them 
He was told that each question consisted of several 
parts and that if, at the end of any question, E felt 
that the physiological response (GSR) indicated guilt, 
then S would be given an electric shock. The shock 
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was then demonstrated, most Ss finding it to be 
quite unpleasant (the Shock was the discharge of:a 
2-mfd. capacitor, charged to 300 v., through @ in. in 
diameter electrodes on the fingerprint area of the 
first and third fingers). In fact, irrespective of the 
particular S’s response, the shock was always given 
following the completion of the GSR to the last part 
of Questions 2, 3, and 5 of the Murder list and Ques- 
tions 1, 3, and 4 of the Theft list. (The purpose of 
the shock was merely to increase S’s general anxiety 
level and increase to some extent his motivation not 
to give a guilty record and thus to create a situation 
resembling a little more that of real criminal interro 
gation.) 

Both interrogation lists were standard and each 
consisted of six multiple-choice-type questions. E 
first read the question and then read each of the 
short alternative answers, allowing sufficient time 
after each for GSR activity to dissipate. One al- 
ternative for each question was relevant for a given 
S. Two of the six Murder questions were as follows: 


(1) If you are the murderer, you will know 
that there was an unusual object present in the 
murder room. Was it (a) a record (b) an easel 
(c) a candy box (d) a chess set?; (2) The mur- 
derer hid the weapon in one of the drawers of a 
desk. Which drawer was it? Was it the (a) 
upper left (b) lower right (c) upper right (d) 
middle (e). lower left? 


Two of the six Theft questions were as follows: 


(1) If you are the thief, you will know where 
the desk was located in the office in which the 
theft occurred. Was it (a) on the left (b) in 
front (c) on the right?; (2) The thief hid what 
he had stolen. Where did he hide it? Was it 
(a) in the men’s room (6) on the coat rack (c) 
in the office (d) on the window sill (e) in the 
locker ? 


The number of alternatives averaged 4.67 in the 
Murder list and 5.0 in the Theft list. Questions 2, 
3, and 6 in the Murder list and 2, 3, 4, and 6 in the 
Theft list were “double-blind,” that is, the relevant 
or guilty alternative was varied at random from S 
to S so that E did not know which was which. 
Questions were always given in the same order 
within a list but whether the Murder or Theft list 
was given first was determined at random 

Scoring was simple, a priori, and objective. An 
S’s GSRs to the several alternatives in a given ques- 
tion were ranked in order of amplitude. If his larg- 
est response was to the relevant alternative, he was 
given a score of 2 on that question. If his second 
largest response was to the relevant alternative, he 
was given a score of 1. Thus, a perfect Innocent 
score was 0 and a perfect Guilty score was 12, for 
both lists. 


Results 


If all scores of 6 or less are classified “inno- 
cent” and all those over six “guilty,” then 
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four Ss from Group 1 and one from Group 2 
would be misclassified as to group, a total of 
5 misses out of 49, or 89.8% hits. Consider- 
ing the two crimes separately, there were 50 
interrogations of Guilty Ss (the 24 Ss from 
Groups 2 and 3 plus the 13 Ss from Group 1 
who were Guilty of both crimes), and 48 in- 
terrogations of Innocent Ss (the 24 Ss from 
Groups 2 and 3 plus the 12 Ss from Group 4 
who were Innocent of both crimes). Forty- 
four of the 50 interrogations of Guilty Ss re- 
sulted in scores of 7 or higher, all of the 48 
interrogations of Innocent Ss gave scores of 6 
or lower, a total of 93.99% correct classifica- 
tion.’ 
Discussion 

It should be emphasized that these results 
by no means represent the upper limit of va- 
lidity that could be achieved with the simple 
and objective guilty knowledge technique. 
On the other hand, one must consider whether 
results from such a laboratory study can 
safely be extrapolated to the real life crimi- 
nal interrogation situation. Some of the points 
that might be raised in this connection are 
discussed below. 

1. All Ss in the real life situation would be 
more emotionally involved in the outcome. 
The use of electric shock in the experiment 
was intended to make the situations some- 
what more comparable in this respect, but 
certainly an important difference still re- 
mained. However, because of the nature of 
the guilty knowledge method, an increase in 
general emotional reactivity in either an in- 
nocent or a guilty S does not in itself affect 
the validity of the test. As long as S is able 
to comprehend the situation and to respond 
more intensely to a question having some spe- 
cial significance for him than he does to most 
of the questions, the method is not com- 
promised in its ability to differentiate inno- 
cence from guilt. 

2. The Ss in this experiment were not par- 
ticularly sophisticated concerning the method 
being used and were not strongly motivated, 
if guilty, to try to defeat the test. There is 
no way in which an S, once he has perceived 
a stimulus, can inhibit what would be his 
normal GSR to that stimulus. However, it 
is possible to try to defeat the guilty knowl- 
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edge type of test by producing intentional 
or artificial responses to the nonsignificant 
stimuli so as to reduce the relative size of 
the involuntary guilty response and so con- 
fuse the record. Artificial GSRs can be pro- 
duced in various ways by a sophisticated. S. 
However, because the GSR is peculiar in that 
it does not produce any proprioceptive stimu- 
lation, it is not possible for a subject to know 
whether his attempt to produce a deliberate 
response has been successful and it is cer- 
tainly impossible for him to deliberately pro- 
duce responses of controlled sizes. Still, it 
remains to be experimentally determined to 
what extent a sophisticated, motivated S can 
confuse in this way a guilty knowledge rec- 
ord. A second experiment is in progress 
which is concerned with this problem. 

3. The Ss in this experiment were college 
students and hardly representative of the av- 
erage run of criminal suspects; perhaps a pro- 
portion of the latter would not respond “nor- 
mally” in such a test. Again, a final answer 
to the question suggested can only be pro- 
vided by an appropriate experiment. The 
literature of lie detection does include refer- 
ences to the problem of the nonreacting S. 
However, in contrast to lie detection pro- 
cedures, the guilty knowledge method, which 
uses each S as his own control, does not re- 
quire that the responses of the guilty S be 
comparable in any way to those of the inno- 
cent, but merely that the guilty S respond 
differently to some of the items than he does 
to others—something which the innocent S 
cannot consistently do. It is interesting to 
note in this connection that one of the Ss in 
Group 1 was a Hungarian expatriate who, 
while engaged in underground activities sev- 
eral years earlier, had been arrested and sub- 
jected to intensive interrogation by Russian 
secret police. Although he had been success- 
ful then in maintaining his forged identity 
and in convincing the MVD that he was ig- 
norant of any underground activities, he was 
easily identified by the guilty knowledge test 
as being guilty of both murder and theft! 

4. The Ss in this experiment spent only a 
few minutes in the mock crime situations and 
therefore had little opportunity to note the 
details of the situation which was used for 
the guilty knowledge test. It was no surprise 
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to find that many Ss who were guilty of the 
murder, for example, reported after the inter- 
rogation that they had not noticed the map 
on the wail of the Murder room, or the chess 
set on the bookcase, or etc. Real life crime 
situations would obviously vary enormously 
among themselves in this respect. A suspect 
who is accused of having robbed a series of 
liquor stores can safely be assumed to know, 
if he is guilty, a number of things which an 
innocent person would not, such as the loca- 
tions and appearances of the stores, the 
amounts taken, the appearance of the various 
victims, certain striking facts about what was 
said or done during the robberies, and so on. 
On the other hand, the question at issue might 
be which one of a group of armed thieves fired 
a fatal shot. In such a case, the guilty indi- 
vidual would not be expected to possess any 
guilty knowledge not shared by his confeder- 
ates and/or the other suspects, and the pres- 
ent method would not be of any use. (Obvi- 
ously, each suspect might be expected to give 
a larger response to the name of the guilty 
one than to the other names, his own ex 
cluded. Such consistency would, if found, 
rather clearly identify the guilty individual. 
However, such a method cannot have the cer- 
tainty of the guilty knowledge technique. ) 
It seems reasonable to suppose that many 
real life crimes would lend themselves -to the 
use of the guilty knowledge method, keeping 
in mind that trivial and seemingly irrelevant 
details are as useful as interrogation stimuli 
as are the more obvious facts, such as the 
weapon used, the article stolen, etc., which 
might be passed on to innocent suspects by 
the newspapers or the arresting officers and 
thereby made useless for this purpose. It 
also seems reasonable that, in such cases, the 
guilty person might be expected to have a 
wider range of guilty knowledge than was in- 


duced in the subjects of the present experi- 
ment. 


5. Since only about 15 min. of interroga- 
tion time and only six questions were used in 
the interrogation for each of the mock crimes, 
it can be assumed that a higher validity could 
easily be achieved by a longer interrogation, 
using questions more than once and using a 
greater variety of questions. With only six 
questions and the simple scoring system used 
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here, about one S in 50 might be classified 
guilty though actually innocent, due to chance 
fluctuations. The probability of such false- 
positive misclassification decreases rapidly as 
the number of question increased. Thus, 
with only 10 questions, having five alterna- 
tives each, less than 3.28% of innocent Ss 
will show guilty responses on more than four 
questions and less than 0.64% on more than 
five. (These figures assume that the ques- 
tions are well enough constructed so that the 
probability of an innocent S$ reacting most 
strongly to the relevant alternative is about 
equal to that for the mean of the other al- 
ternatives. ) 

6. The scoring system used in this experi- 
ment was simple and did not inv’ ve any at- 
tempt to defend against the sibility of 
S making deliberate responses in order to de- 
feat the test. The guilty knowledge method 
does not require one to assume that the guilty 
S will tend to give l/arger reactions to the 
relevant items, although the present scoring 
system did require this result. All that need 
be assumed is that the guilty S will react 
differently to the relevant items, as a group, 
than he does to the irrelevant alternatives. 
The only way in which an S can behave con- 
sistently differently with respect to the set 
of relevant alternatives than he does to the 
others is by having some way of distinguish- 
ing these alternatives from the rest, i.e., by 
having the guilty knowledge which declares 
him to be guilty in fact. In a situation where 
active attempts by a sophisticated S to de- 
feat the test are to be expected, then a more 
subtle scoring system than the one used above 
should yield a higher validity. 


David T. 


Lykken 


Summary 


Forty-nine male college students, after ran- 
dom assortment into four groups, were re- 
quired to enact one, both, or neither of two 
mock crimes. All were then given a guilty 
knowledge test, employing the GSR, which 
used six standard questions relating to each 
of the two crimes. A simple, objective, and 
a priori scoring system was used to determine 
guilt. Forty-four or 89.8% of the Ss were 
assigned to their correct group, against a 
chance expectancy of 25%. Considering the 
crimes separately, all Ss innocent of a crime 
were correctly classified, while 44 of 50 inter- 
rogations of Guilty Ss gave guilty classifica- 
tions, a total of 93.9% correct, classification 
against a chance expectancy of 50%. 

Lie detection, requiring unreasonable as- 
sumptions about the consistency of physio- 
logical response patterns, has not been shown 
by acceptable research to have the high va- 
lidity claimed for it and which is necessary 
for its useful application. Detection of guilty 
knowledge, while less widely applicable, is a 
more reasonable, objective, and generally de- 
fensible technique and is demonstrably ca- 
pable of very high validity in those situations 
where it can be used. 
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DURATION OF MOVEMENTS IN A DIAL SETTING 
TASK AS A FUNCTION OF THE PRECISION 
OF MANIPULATION * 


J. RICHARD SIMON ano BETTY PEARL SIMON 


State University of lowa 


This study deals with the interrelation of 
the component movements in a dial setting 
task. The variable manipulated is the pre- 
cision or accuracy required to set the dials. 

Psychologists and engineers have long rec- 
ognized that an important factor in determin- 
ing the duration of a task is the precision 
which the task demands of the operator. 
Nevertheless, there has been little systematic 
research on the effects of precision as a vari- 
able on movement times. Predetermined mo- 
tion time systems (Maynard, 1956, Sec. 4) 
consider the precision, or manual control, or 
difficulty of a movement component in set- 
ting the time standard for that component. 
Little is known, however, of the effects which 
variation in the precision requirements of one 
part of a movement have on the durations of 
other parts of the same movement. 

In the present study, the precision required 
to adjust each of two dials on a simplified 
control panel was systematically varied while 
all other characteristics of the control move- 
ment were held constant. In this way, the 
effects of precision of manipulation on the 
durations of other parts of the motion cycle 
were investigated. Though the results of this 
research may have practical applications in 
the setting of time standards, this was not its 
primary objective. This study is one of a 
series (Davis, Wehrkamp, & Smith, 1951; 
Harris & Smith, 1954; Rubin, Von Trebra, & 
Smith, 1952; Simon & Smader, 1955; Simon, 
1956; Wehrkamp & Smith, 1952) aimed at 
gaining a more complete understanding of hu- 
man manual movements through the system- 
atic identification and delineation of the vari- 
ables which affect movement duration. 


1 Data for this paper were collected during 1955 
56 while the senior author was a Fulbright research 
scholar at the University of Cambridge, England 

The authors are indebted to Alan T. Welford for 
his cooperation and assistance and to Karl U. Smith 
for the loan of the recording apparatus. 


Method 


Apparatus. Figure 1 pictures the main elements of 
the dial setting task. S is shown seated in front of 
the control panel adjusting one of the two dials. His 
task was simply to adjust the dials one after the 
other as rapidly as possible. An electronic motion 
analyzer, described previously (Simon & Smader, 
1955) was used to record separately and automati- 
cally the durations of the parts of the dial setting 
task. By grasping and turning the right-dial, S com- 
pletes a circuit and current is supplied to a precision 
timer which records the duration of that manipula 
tion to .01 sec. When S releases the dial and moves 
toward the left dial, the first clock stops and a second 
clock begins which records the duration of the right 
to left travel movement. S’s contact with the left 
dial stops the travel clock and starts a third clock 
which records the time taken to adjust the left dial 
A fourth clock similarly records the duration of the 
left to right travel movement. The duration of each 
successive manipulation or travel movement is ac- 
cumulated on one of these four timers 

The two types of dials used in the experiment are 
also pictured in Fig. 1. Two dials of Type A and 
two dials of Type B were used. The diameter of 
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each dial measured 4 in. The diameter of the 
knurled knob which S grasped to manipulate the 
dial was 14 in. and extended i in. from the dial face. 
Considering the 12 o’clock position as 0°, the four 
white marks on each dial face were placed at 0°, 70°, 
170°, and 285°. The white marks on the face of 
Dial Type A were narrow (2° of arc). The marks 
on the face of Dial Type B were wide (20° of arc). 
Aligning a mark on the face of Dial Type A with 
the target mark (2°) above the dial required S to 
make a fine adjustment, while aligning a mark on 
the face of Dial Type B with the target mark (2°) 
required only a gross adjustment. 

All dials and mountings were precision machined 
and aside from the difference between Type A and 
Type B in the width of the marks on their faces, 
the four dials were alike. The dials rotated easily, 
but there was sufficient friction so that S’s release of 
the dial after making a setting would not throw the 
setting out of line. 

The S’s task was to set the dials alternately, first 
the right dial and then the left dial, etc., until each 
dial had been set 12 times. S used his right hand 
only. To set a dial he had to rotate it approxi- 
mately + turn in order to align the next mark on 
the dial face with the target mark at the 12 o’clock 
position above the dial. During a trial, each dial, 
whether Type A or Type B, was rotated through 
three complete turns. Thus, the precision of the 
manipulative part of the task was varied without 
altering the extent of the manipulation. To begin a 
trial, S briefly tapped the left dial and then moved 
to adjust the right dial. There were, then, 12 left 


LEFT DIAL 


CONDITION: 
I 


RIGHT DIAL 


M7 
Fic. 2. The four experimental conditions used to 
investigate the effects of precision on movement 
durations. 
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to right travel movements and 12 right to left travel 
movements during each trial. 

An essential portion of the apparatus was the light 
above each dial which signaled a correct setting. A 
circuit was arranged so that when a mark on the 
dial face was correctly aligned with the target mark 
above the dial, the signal light would go on. Align- 
ing one dial automatically caused the light above the 
other dial to go off. The system of signal lights per- 
mitted E to maintain a constant standard of accu- 
racy for all Ss since Ss were instructed to “work as 
fast as you can” but “always be sure the light is on 
before moving to set the next dial.” 

The dial setting task was designed to involve visual 
cues primarily. There were no perceptible tactual 
cues to aid the operator. The relays in the signal 
light circuit were housed in a soundproof box to 
eliminate auditory cues. The four marks on the dials 
were arranged so that, for each setting, a different 
amount of rotation was required to bring the dial 
into alignment. Since S did not replicate the same 
rotation each time, kinesthetic cues were minimized. 

Procedure. The experimental conditions consisted 
of the four combinations of fine and gross settings 
pictured in Fig. 2. Condition I involved a gross 
manipulation of both dials. Condition II involved 
a fine manipulation of the left dial and a gross ma- 
nipulation of the right dial. Condition III involved 
a gross manipulation of the left dial and a fine ma- 
nipulation of the right dial. Condition IV involved 
a fine manipulation of both dials. 

Twenty-four right-handed naval enlisted men (mean 
age 22.1; SD 3.9) served as Ss. Each S reported on 
four days during a five-day period. On the first 
day each S was assigned to one of the 24 possible 
sequences of the four experimental conditions, and 
he continued to perform in that sequence during the 
following three sessions. All Ss, then, performed on 
all experimental conditions during each of the four 
sessions. A session consisted of 12 trials, three on 
each of the four experimental conditions. S com- 
pleted the three trials on one condition before per- 
forming on the next condition in his sequence. 


Predictions 


On the basis of previous experimental re- 
sults (Simon, 1956) which indicated that: 
(a) increasing the perceptual loading of a 
movement increases its duration and (0) in- 
creasing the perceptual loading of one part of 
a task increases the durations of certain other 
parts of the task; and assuming that increas- 
ing the precision requirements of a movement 
is one method of increasing its perceptual 
loading, several specific predictions were made. 

1. Durations of travel movements will be 
influenced by the precision of the manipula- 
tion which precedes the travel. With travel 
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Table 1 


Mean Duration (Seconds) of Parts of Dial Setting Task under Four Experimental Conditions 
for Four Practice Sessions 


Component- 
Condition 
Code 


Experimental 
Parts of Task Condition 
Left to right travel I 
II 
Ill 


IV 


Right to left travel 


Left dial manipulation 


Right dial manipulation 


direction and precision of the subsequent ma- 
nipulation held constant, travel movements 
will be slower when preceded by a fine ma- 
nipulation than when preceded by a gross 


In terms of the code notations 
the following relations should be 
T.>T:, Te> Ts, Te > to See 


manipulation. 
in Fig. 2, 
observed: 
Te > Ta 

2. Durations of travel movements will be 
influenced by the precision of the manipula- 
tion which follows the travel. With travel 
direction and precision of the preceding ma- 
nipulation held constant, travel movements 
toward a fine manipulation will be slower than 
travel movements toward a gross manipula- 
tion. Again in terms of Fig. 2, the following 
relations should be observed: T; > T,, T; 
Ts, Ts > Torand Ts > Te. 

3. Since precision of manipulations either 
preceding or following a travel movement will 
affect the duration of the travel, movements 
between two fine manipulations will be slower 
than movements between two gross manipula- 
tions. In terms of Fig. 2, T; > T, and 
Ts > Ts. 


Sessions 


4.55 
4.97 
5.18 
5.46 


4.47 
5.08 
5.09 
5.44 


8.89 
17.36 
10.37 
17.87 


15.37 
8.26 
16.01 


8.88 
10.46 
19.69 
20.36 


8.13 
8.08 
17.16 
17.82 


Results 


The performance of 24 Ss during their 
fourth session was analyzed to determine the 
effects of precision as a variable on the dura- 
tions of the four parts of the dial setting task. 
For each S, a median time was determined 
for each part of the task under each experi- 
mental condition. 

Four separate analyses of variance were 
performed,? one for each part of the task; 
viz., left dial manipulation, right dial ma- 
nipulation, left to right travel, and right to 
left travel. These analyses indicated that the 
experimental conditions produced significant 
(p< .01) variations in the durations of all 
four parts of the task. 


2 Summaries of the analyses of variance have been 
deposited with the American Documentation Insti 
tute. Order Document No. 6024 from ADI Aux 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress 
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Table 2 


Predicted Relationships between Movement Durations 
and Tests of Observed Differences 


Observed 
Difference 
Between 
Means 
Prediction (Seconds) 
31 
18 


1 ae 


all < Ol 

“if the error rate per experiment (see Ryan) is set at .O1, 
his ¢ is not significant The rest of the starred values are 
significant in terms of error rate per It as \ 

rate per comparison 


vell as error 


Duration of Travel Movements 


Table 1 presents the mean durations of the 


parts of the task under each experimental con- 


dition over the four practice sessions. Using 
scores from the fourth session, 10 ¢ tests be- 
tween correlated means were computed in or- 
der to test the specific predictions regarding 
the effects of varying precision on travel 
movement durations. 

Table 2 summarizes the results of these 
The first prediction stated that with 
travel direction and precision of the subse- 
quent manipulation held constant, travel 
movements will be slower when preceded by 
a fine manipulation than when preceded by a 
gross manipulation. This prediction was veri- 
fied by ¢ tests No. 1-4. 

The second prediction stated that with 
travel direction and precision of the preced- 
ing manipulation held constant, travel move- 
ments toward a fine manipulation will be 
slower than travel movements toward a gross 
manipulation. This prediction was verified 
by ¢ tests No. 5, 6, and 7 but was not sup- 
ported by 8, which was the right to left 
movement preceded by a fine manipulation. 

The third prediction stated that move- 


tests. 


ments between two fine manipulations will be 
slower than movements betwec. two gross 
manipulations. This prediction was verified 
by ¢ tests No. 9 and 10. 


Duration of Manipulative Movements 


Table 1 shows that setting Dial Type A 
(fine manipulation) required about twice as 
much time as setting Dial Type B (gross ma- 
nipulation). The ¢ tests of My vs. Me and 
M; vs. M; on Session 4 indicated that the 
duration of the gross manipulation was not 
affected by being paired with a fine manipula- 
tion rather than another gross manipulation. 
During the earlier sessions, however, the gross 
manipulation was significantly slowed when 
paired with a fine manipulation. A compari- 
son of M; and M; suggests that a fine adjust- 
ment is faster (p < .05) when paired with a 
gross adjustment than when paired with a 
fine adjustment. This relationship, however, 
was not substantiated by the Mx vs. Mg com- 
parison. 


Changes with Practice 


The major effects of the experimental con- 
ditions on the durations of the movement 
components noted during Session 4 appeared 
consistently over the first three practice ses- 
sions as well. The average decrease in the 
duration of the manipulative portions of the 
task from Session 1 to Session 4 was 13%, 
while the duration of the travel movements 
decreased only 4° over the same period. 
There appeared to be no tendency for the 
precise manipulations to improve more with 
practice than the gross manipulations. On 
the contrary, the most marked practice effect 
was observed for the gross manipulation un- 
der the condition where the other terminal 
manipulation involved a fine adjustment. 


Discussion 


This study clearly demonstrates that the 
time required by operators to move between 
two adjustments in a repetitive dial setting 
task depends on the precision of those ad- 
justments. Travel movements following a 
fine adjustment are significantly slower than 
movements following a gross adjustment. In 
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general, travel movements toward a fine ad- 
justment are significantly slower than move- 
ments toward a gross adjustment. Thus, an 
increase in travel time is associated with an 
increase in the precision of the adjustment 
either preceding or following the travel. 

It is important to point out that variation 
in precision was accomplished without alter- 
ing in any observable way the make-up of the 
travel movement between the dials. In other 
tasks used to investigate the effects of toler- 
ance requirements on movement times, the 
character of the travel movement is usually 
modified in that the positioning necessary to 
perform the terminal manipulation is changed. 
Examples of this latter type of task would be 
tapping in a target area where the size of the 
target is varied (Fitts, 1954) or assembling 
pegs into holes where the difference between 
peg size and hole size is changed (Maynard, 
1956, Sec. 4, p. 95). In the present study, all 
movements were between objects of constant 
size in a fixed location. It is difficult to see 
how the slowing of travel which accompanied 
increased precision of manipulation could be 
related to changed requirements for position- 
ing. 

If the content of the travel movemerit was 
not altered, i.e., no elements added or changed, 
how are the present results to be explained? 
It appears that the speed of an operator’s 
control movements are determined not only 
by the specific requirements of the individual 
movement components but by the character- 
istics of the task as a whole. This study 
points out the close interrelation between 
parts of a work cycle. It is an oversimplifi- 
cation to conceive of a task as an additive 
combination of separate and independent ele- 
ments. 

Suppose that the standard time allowance 
for the travel part of a task was derived from 
a situation where the operator moved _ be- 
tween two gross adjustments. How much 
would this time allowance be in error if it 
were applied to other situations in which the 
precision requirements of the manipulative 
portions of the task were different? The pres- 
ent study indicates that the duration of the 
travel movement increases 8% when the pre- 
cision requirements of one of the terminal 
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manipulations is increased and that the dura- 
tion of this same basic travel movement in- 
creases 11% when the precision requirements 
of both terminal manipulations are increased. 
These differences, though small, are highly 
reliable and statistically significant. Whether, 
with the present state of the art of industrial 
time study practices, these statistically sig- 
nificant differences are of practical signifi- 
cance is another question. 


Summary 


This study was concerned with the effects 
of precision as a variable on movement dura- 
tion. Ss were required to adjust alternately 
each of two dials on a control panel. The 
precision required to adjust each dial was 
systematically varied and the effects of this 
variation on the durations of four parts of 
the control movement were determined. An 
electronic motion analyzer recorded separately 
and automatically the durations of each part 
of the task: i.e., the two dial adjustments and 
the two travel movements. 

Results clearly demonstrated that the time 
taken by operators to move between adjust- 


ments depended on the precision require- 
ments of those adjustments. 
ments following a fine adjustment were slower 
than movements following a gross adjustment, 
and, in general, travel movements toward a 
fine adjustment were slower than movements 


Travel move- 


toward a gross adjustment. These findings 
indicate that the speed of control movements 
are determined not only by the content of in- 
dividual movement components but by the 
over-all characteristics of the task. Results 
provide additional evidence to refute the con- 
cept that a work cycle consists of an additive 
combination of independent elements. 
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ON THE EQUIVALENCE OF CLINICAL AND 
STATISTICAL METHODS ' 


DANIEL SYDIAHA 


University of Saskatchewan 


This paper seeks to resolve an inconsist- 
ency in Meehl’s (1954) analysis of statisti- 
cal vs. clinical methods of assessment. (For 
purposes of this paper, the term “statistical 
method” refers to the arithmetic combination 
of data by means of an equation or table. 
“Clinical method” refers to the process where- 
by a judge makes a decision or diagnosis after 
“reflecting upon” all of the alleged relevant 
information at his disposal. This process may 
be said to be monmechanical or informal 
(Meehl, 1954, p. 16) and it may involve in- 
tuition on the part of the clinician to a greater 
or lesser degree.) 

On the one hand, Meehl contends that of 
the two assessments, the clinical one is based 
on more information since it typically in- 
cludes information which has been described 
as unsystematic, qualitative, and idiosyn- 
cratic, in addition to the systematic infor- 
mation which is common to both. Repeat- 
ing Meehl’s illustration, a statistical clerk is 
unable to organize this unsystematic infor- 
mation and incorporate it into a decision or 
diagnosis, as a clinician does. On the other 
hand, Meehl’s summary of the empirical evi- 
dence suggests that clinical predictions are 
no better than statistical ones. 

There are at least two ways of resolving 
this inconsistency: 

Assumption A: Meehl’s contention may be 
illusory, i.e., the clinician may have the mis- 
taken belief that unsystematic information is 
being incorporated into his decision, whereas 
in fact such information may contribute noth- 


1 Based on a thesis submitted to McGill University 
in partial fulfillment of the requirements for the 
Ph.D. degree, 1958. The author gratefully acknowl- 
edges the assistance of members of the Canadian 
Army Personnel Selection Service, including its direc- 
tor, W. R. N. Blair. E. C. Webster directed the re- 
search, and John Kenyon assisted with computa- 
tions. Financial assistance was provided by the 
Psychiatric Services Branch, Saskatchewan Depart- 
ment of Public Health, and by the Defence Research 
Board of Canada, Grant No. 9435-53 to iE. C. 
Webster. 


ing new, but may merely confirm a set estab- 
lished by previously scanned systematic in- 
formation. It would follow from this that 
statistical and clinical judgments would be 
equivalent, i.e., both meihods would generate 
the same decisions. 

Assumption B: Unsystematic information 
might be subject to misinterpretation to a 
greater extent than is systematic information, 
i.e., Clinical judgments may be subject to 
such factors as interclinician bias, halo effects, 
and intraclinician inconsistency to such an ex- 
tent as to effectively contribute nothing to 
the prediction. It would follow from this 
that the two methods need not generate the 
same decisions, and the clinical decision 
would have a greater degree uf error associ- 
ated with it. The purpose of this investiga- 
tion was to determine which of these two 
alternative assumptions was more plausible. 

The method involved developing two ra- 
tional, explicit decision making models, one a 
so-called statistical model, and the second a 
so-called clinical model, whose formal prop- 
erties approximated, at least in part, the 
“real” properties of the two respective assess- 
ment methods. Both models were designed 
to generate decisions on the basis of the fol- 
lowing considerations: 

(1) The “raw data” upon which the model 
decisions were based was provided by inter- 
viewers regularly engaged in making clinical 
judgments; in this instance Canadian Army 
personnel officers selecting recruits for the 
Canadian Army, regular force. 

(2) Both models were designed to maxi- 
mize the correlation between model decisions 
and the actual decisions made by the inter- 
viewers. In effect, the adequacy of the mod- 
els in accounting for the interviewers’ deci- 
sions was being investigated. 

The following hypotheses were advanced, 
consistent both with Meehl’s analysis and 
with Assumption B outlined above: 
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H,: Decisions generated by a clinical deci- 
sion making model correspond more 
closely to real decisions than do decisions 
generated by a statistical decision mak- 
ing model. 

Decisions generated by clinical and sta- 
tistical decision making models are not 
equivalent, i.e., are imperfectly corre- 
lated. 

Decisions generated by a clinical deci- 
sion making model are more unreliable 
than are decisions generated by a sta- 
tistical decision making model, i.e., they 
are subject to interclinician bias, halo 
effect, and intraclinician inconsistency. 


Attention should be drawn to the definition 
of.the word “clinical” in the opening para- 
graph. According to Meehl (p. 15) “clini- 
cal” refers both to data, i.e., nonsystematic 
kinds of information, and to method, i-., 
nonmechanical methods of combining data. 
(The word “systematic” is considered prefer- 
able to “psychometric” as used by Meehl, 
since much data which is “nonpsychometric” 
is “systematic” and can be combined me- 
chanically. Example: biographical informa- 
tion.) Meehl’s primary interest is in com- 


paring methods applied to the analysis of 


systematic data only. Apart from consider- 
ing the logical status of nonsystematic data 
(Chap. 6, pp. 37-67), he has devoted little 
attention to this aspect of the problem. 
The point of view adopted in this paper is 
that empirical comparisons of methods should 
include both systematic and nonsystematic 
data. Presumably this failure to consider 
nonsystematic data stems from the inability 
to treat such data mechanically. (This pre- 
sumption is supported later in this paper.) 
Consequently, if one is to include nonsys- 
tematic data in the comparison of methods, 
one is left with a comparison between sys- 
tematic data treated mechanically (the so- 
called statistical method) and both system- 
atic and nonsystematic data treated nonme- 
chanically (the so-called clinical method), 
which is the comparison examined in this pa- 
per. Admittedly, there is a confounding of 
data and method (as defined by Meehl) in 
this design. In defense of this design, it is 
to be noted that clinical methods (as defined 
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herein) are in widespread use, for any num- 
ber of reasons, and the evaluation of such 
methods, according to the hypotheses out- 
lined above, is deemed desirable. 


Procedure 


Rationale of Decision Making Models. A modified 
Q-sort method of personality description was used 
as the basis for the clinical decision making model. 
(See Stephenson [1949] for an elaboration of the 
relevance of Q sort to clinical methods.) Use of Q 
sort for this purpose was based on the assumption 

at the essential operation involved in clinical as- 

ssment was the preparation of a case report in 

lich the assessee is described in terms of dominant 
traits, abilities, motives, etc. The case report, in 
effect, represents the clinician’s explication or de 
fense of his decision or judgment. Thus the Q sort 
and the case report are similar in. that they. are 
methods of describing people, and they differ only 
in that the Q sort is more systematic and includes 
a fixed number of descriptive statements. This sys- 
tematic aspect of the Q sort was desirable for pur- 
poses of this study since it made possible the assess- 
ment of errors of measurement according to Hs out- 
lined above. 

The discriminant function was used as the basis 
for the statistical decision making model. While the 
use of a linear model rather than curvilinear or in 
teractive models limits the validity of the results, it 
was selected as being the only known, practicable 
method of combining scores from 26 variables. It 
is acknowledged that the use of curvilinear or inter- 
active models might drastically alter the conclusions 
drawn in this paper. 

For purposes of convenience, the terms “clinical 
scores” and “statistical scores” will be used in the 
balance of this paper to designate decisions gener- 
ated by clinical and statistical decision making mod- 
els respectively. The reader is reminded that such 
scores refer only to the model decisions as defined 
above, and not to the actual decisions made by in- 
terviewers. 

Subjects. Each of eight Canadian Army Regular 
Farce personnel officers interviewed from 14 to $0 
regular force applicants for “other” ranks in the 
Canadian Army, ie., ranks other than officers and 
noncommissioned officers. Total N= 256.  Inter- 
views conducted under usual Army circum- 
stances except for additional information required 
for this study, and the interviews were sound-re- 
corded. Each case was assessed by only one officer, 
who classified the applicant as either an “accept” or 
“reject,” and who provided all of the information 
required for this study. Reject cases included 20 
applicants recommended for reassessment at a later 
date and 7 applicants referred for psychiatric as- 
sessment. Accept cases included 29 applicants con 
sidered only marginally suitable for Army service. 

The typical induction procedure, following the 
personnel officer’s decision, was a review of this de- 
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cision by the commanding officer of the personnel 
depot. If the application was approved by the com- 
manding officer, the recruit applicant was then sworn 
into the Army. The decision of the personnel officer 
was crucial to the applicant’s acceptance, even though 
the final decision was made by the commanding offi- 
cer, since the personnel officer’s decision was over- 
ruled by his commander very infrequently. 

The cases were divided into three separate groups 
to permit cross-validation of findings. Cases pro- 
vided by Officers A, B, F, and G (N = 38, 50, 50, 
41) were randomly assigned to a criterion group 
(N = 89) and a holdout group (V=90). The re- 
maining cases, ie., those provided by Officers C, D, 
E, and H (N= 14, 23, 22, 18), made up a second 
holdout group (N=77). (The terms “criterion 
group” and “holdout group” are used to designate 
the samples used in the two stages of cross-valida 
tion procedures, ie., optimal scoring procedures 
were developed with the criterion group, and these 
procedures were then applied to the holdout groups 
Unless otherwise indicated, only results obtained 
with holdout groups are reported.) 

Clinical Scores. Clinical scores were based on a 
120-item Q sort completed by the officer for each 
applicant at the termination of the interview and 
after he had made his decision. The 120 statemenis 
were obtained in a preliminary critical incident sur- 
vey of 141 Army interviews in which personnel offi- 
cers related factors particularly important in leading 
to decisions to accept or reject applicants. Both ob- 
jective facts and subjective impressions were ob- 
tained in a list of 182 incidents, and these were 
classified into the 120 Q-sort items. The officers’ 
terminology was preserved as much as possible, and 
the content of each item was limited to a single trait 
or attribute. To facilitate the sorting procedure, a 
printed check list was used in which the officer as- 
signed the items to a 9-point continuum (unforced 
distribution) ranging from least to most descriptive 
of the applicant. 

In addition to a description of each applicant, 
each officer used the check list to describe an ideal 
applicant at the conclusion of the study. This 
“ideal” represented each officer’s judgment of the 
relative importance of the check list items in de- 
scribing applicants thought most likely to succeed 
in the Army. This was done to test the validity of 
the check list procedure, since if it were valid one 
would expect accepted applicants to correspond more 
closely to the ideal than would rejected applicants. 
A “congruence” score was calculated for each of the 
256 descriptions, which consisted of a Pearson prod 
uct-moment correlation coefficient between each de 
scription and the ideal as described by the inter- 
viewing officer. The magnitude of such congruence 
scores were compared for accepted and rejected 
cases. 

Clinical scores were obtained by assigning num- 
bers to items, ranging from 1 to 9, corresponding to 
the 9-point continuum, and by combining the num- 
bers arithmetically. To maximize the correlation be- 
tween clinical scores and acceptance-rejection, this 


fact found to occur in thi 
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scoring system was limited to 67 items found to be 
significantly related (.05 level) to acceptance-rejec- 
tion for the criterion sample, using point-biserial 
correlation as the item-analysis procedure. 

The reader may feel that these statistical pro- 
cedures have rendered the clinical decision making 
model somewhat statistical in nature, thus invalidat- 
ing the distinctions made in the opening paragraph 
of this paper. It is to be noted, however, that no 
validity for the model is claimed beyond the assump- 
tion inherent in the model, ie., it is assumed that 
the Q sort approximates, at least in part, the activi- 
ties of a clinician. To the extent that this assump- 
tion is warranted, the results are valid. Further- 
more, to argue that setting up any artificial clinical 
situation (such as requiring the clinician to perform 
a Q sort) destroys its essential characteristics is to 
render the analysis of clinical decision making proc- 
ess untestable. 

Statistical Score Statistical scores were based on 
test and biographical data regularly examined by 
Army personnel officers, namely the Army M test 
(a classification test comprising mechanical, verbal 
and nonverbal intelligence scores), the MMPI, and 
a biographical data sheet. Apr attempt was also 
made to scale interview content ior inclusion in the 
statistical score, in line with Meehl’s argument that 
comparative studies of statistical and clinical judg- 
ments should be based on the same information. A 
method of content analysis was developed and ap- 
plied to the criterion sample of one officer. The 
content items significantly related to the accept- 
reject criterion were then applied to the holdout 
sample for the same officer: the number of items 
which remained significant was below chance ex- 
pectation, and it was assumed that interview con- 
tent could not be scaled. Since the officer chosen 
for analysis was judged to be the most systematic 
interviewer of the eight, no further attempts to scale 
interview content were made. 

Scores corresponding to statistical decisions were 
computed from a multiple-regression equation (dis- 
criminant function) so as to maximize the correla- 
tion between the obtained scores and acceptance- 
rejection for the criterion, group. Thirteen items of 
biographical and test data were used, and these were 
selected at random from a total of 26 items which 
were available. (Computational labor involved for 
26 items was considered prohibitive. Subsequent 
analysis tended to justify use of only 13 items in 
that all 26 items combined by a standard score 
method gave comparable results: see Table 1, Col. 3 
and 4.) 

Following Ezekiel (1947) and McNemar (1955), 
selection of items was made on a random basis 
rather than on the basis of item analysis, as was 
done with clinical scores. Ezekiel and McNemar 
have pointed out that for multiple correlation meth- 
item selection tends to capitalize on chance 
fluctuations in test distributions, with the result 
that marked shrinkage occurs in the correlation from 
criterion to holdout —_— Such shrinkage was in 

study. 
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Table 1 
Intercorrelations among (a) the Accept-Reject Criterion, (b) Statistical Scores, and 
(c) Clinical Scores, for Holdout Samples of Cases 


* 
Tab 


1 13 26 
Sample Neb Items Items 


A A7 44 
B 25 31 46 
_ 14 ** 
D 23 .29 
E 
F 25 
G 
H 17 
A+B+F+G 88 
C+D+E+H 73 


* Point-biserial correlations 


** Product-moment correlations 
*** Officer C had no rejected applicants in his sample. 
comparable to the others shown 


Results 
Validity of Check List Procedure 


Congruence scores were significantly higher 
for accepted than for rejected applicants 
(rppi = -78 between acceptance-rejection and 
the distribution of congruence scores). It 
was assumed that the check list discriminated 
between accepted and rejected applicants. 


Correlations of Acceptance-Rejection with 
Clinical and Statistical Scores 


Clinical scores correlated with acceptance- 
rejection at a higher level than did statisti- 


cal scores (see Col. 3 and 6, Table 1). This 
result tended to support H,: decisions gener- 
ated by the clinical model corresponded more 
closely to real decisions than did decisions 
generated by the statistical model. 


Correlation between Statistical and Clinical 
Scores 


Correlations were .36 and .52 for the two 
holdout groups (see Table 1, Col. 8). Thus 
He was supported in that although these cor- 
relations are significantly different from zero, 
they do not approximate 1.00, and hence 
there was reason to believe that the two sets 
of scores were not equivalent. 


onsequently these correlations 


are either indeterminate r not 


Sources of Error Associated with Clinical 
Scores 


1. Interofficer bias 


There was no marked tendency for indi- 
vidual officers to limit their descriptions to 
specific Q-sort items since there were no sig- 
nificant differences among the officers’ sam- 
ples in the magnitude of the correlation be- 
tween clinical scores and acceptance-rejection 
(see Col. 6, Table 1; p> .2 as tested by the 
method described in Snedecor [1948], p. 
154. All correlation differences reported in 
this paper were tested by this method). In 
other words, the 67 selected Q-sort items pre- 
dicted the decisions of all interviewers equally 
well. 

Verification of this result was obtained from 
two other sources: 

(a) The intercorrelations of ideal appli- 
cants described by officers were high (range 
.56 to .98; median = .81), which suggested 
that they were looking for approximately the 
same attributes in applicants. 

(b) A second set of clinical scores was 
computed based on the same considerations 
as are outlined above, but with a separate set 
of items selected for each of the four officers, 
A, B, F, and G. In effect, a separate decision 
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Table 2 


Correlations between Clinical Scores, as Determined by Two Methods of Item Scoring, 
and the Accept-Reject Criterion 


Items Selected for 
Individual Officers 


= 


Sample 
A 
B 
F 
G 
A+B+F+G 


a 
— uu oOo 


= 


*® Mean correlation. 


making model was developed for each officer, 
thus permitting individual officer biases to in- 
fluence the results of this second set of clini- 
cal scores to a maximum degree. The two 
sets of scores obtained, however, were not 
different (see Table 2), which tended to sug- 
gest that interofficer biases were negligible. 
Apparently the criterion variance accounted 
for by items common to all four officers was 
equally well accounted for by the criterion 
variance specific to each officer. 


2. Reliability 
Split-half reliability and internal consist- 


ency of clinical scores were high (see Table 3). 
There were significant differences among offi- 


Items Selected for 
Combined Sample 


N Tpbi 


67 
67 


.92 
92 
.68 
82 
85 


cers for the coefficients of homogeneity ob- 
tained (p < .001) but not for the coefficients 
of split-half reliability (p > .1). 


3. Halo effect 


Intercotrelations among ten randomly se- 
lected Q-sort items were low (range from 
—.45 to .25; mean = .016; when items were 
reflected so as to make them uniform in di- 
rection as to favorability, mean = .081). This 
result tended to suggest the absence of halo 
effects and other such “errors of association” 
in which raters attribute unwarranted rela- 
tionships among items (Guilford, 1954, pp. 
278-280). 


Table 3 


Coefiicients of Homogeneity (Kuder-Richardson Formula 20) and Reliability (Split-half) for 
Clinical Scores, Based on 67 Selected Items 


Coefficient of 
Homogeneity 


Criterion 
Sample Sample 
A 92 
B 94 
Cc 
D 
E 
F 
G 
H 
A+B+F+G 
All cases 
(A-H) 


Holdout 
Sample 


94 


Coefficient of 
Reliability 


Holdout 
Sample 


Criterion 
Sample 


.96 
.96 


97 
97 
76% 
87 
.82 
81 
78 


® Clinical scores for Officer C were restricted in range since there were no rejected applicant 
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Sources of Error Associated with Statistical 
Scores 


There was no marked tendency for the in- 
dividual officer samples to be associated with 
specific items of test and biographical data 
since there were no significant differences 
among the officers’ samples in the magnitude 
of the correlation between statistical scores 
and acceptance-rejection (p > .8, see Table 1). 
In other words, the statistical decision mak- 
ing model worked equally well for all officer 
samples. 

The assessment of the reliability of test 
and biographical data was considered un- 
necessary in view of the high reliability co- 
efficients obtained for clinical scores, i.e., H; 
was rejected on grounds that the reliability of 
clinical scores was at least as high as the reli- 
abilities usually obtained with psychological 
tests. 


Discussion 


Hypotheses 1 and 2 were confirmed: the 
clinical decision making model corresponded 
more closely to real decisions than did the 
linear statistical decision making model, and 
the two models were uncorrelated. Meehl’s 
contention that the clinician performs a dis- 
tinctive role of interpreting and classifying 
idiosyncratic information was thus supported. 
Assumption A, i.e., that the clinical incorpora- 
tion of unsystematic data was illusory, was 
rendered implausible by these results. 

This study failed, however, to reveal any 
substantial degree of error associated with 
clinical decision making models, as might 
have been expected by alternative Assump- 
tion B. This is rather surprising in view of 
Meehl’s review of the evidence, cited earlier, 
which indicated that clinical predictions are 
no better than statistical ones. If the result 
‘of the decision making process (the clinical 
prediction) contains error, then one might ex- 
pect to find sources of error in the decision 
making process itself. 

There are at least five possible explanations 
for this discrepancy between the implied pres- 
ence of error associated with clinical judg- 
ment in the studies reviewed by Meehl, and 
the apparent absence of error associated with 
the clinical scores obtained in this study: 


‘the scope of this study. 
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(a) Halo effects and other errors of association 
may have been present in the data of this study but 
may not have been adequately assessed by means of 
the 10 items selected. 

(6) It is sti possible for interviewers to agree 
among themselves as to what traits or attributes are 
important in assessment, and still disagree as to 
standards of acceptance required of an interviewee. 
This would result in differences among interviewers 
in their decisions about the same applicants. 

(c) It is also possible that the criterion variance 
unaccounted for by the clinical decision making 
model used in this study was subject to interviewer 
bias. Since the 67 Q-sort items accounted for ap- 
proximately 64% of the decision variance (the cor- 
relation between acceptance-rejection and _ clinical 
scores was approximately .8) the balance of the vari 
ance, ie., 36%, was unaccounted for, and may have 
included some error variance. 

(d) The circumstances under which the study was 
conducted may have been such as to make the in- 
terviewers more cautious and hence more reliable in 
their decisions than they would have been normally 

(e) The interviewers may have attended to invalid 
information or they may have applied nonoptimal 
weights to information’ examined. 


The assessment of these factors was beyond 
Factors (6) and (c) 
would require that a sample of interviewers 
interview the same applicants. Regarding 


(d), all officers reported at the conclusion of 
the project that they had given more thought 


to their decisions during the course of the 
study than was customary. There was also 
reason to believe that personnel selection in 
the Canadian Army may differ from most se- 
lection procedures in that the reports of per- 
sonnel officers are carefully examined by their 
superiors in the Personnel Selection Service. 
This might be expected to result in greater 
uniformity among interviewers. 

It should be understood that the validity 
of the decision making models developed in 
this study was not investigated, and conse- 
quently Factor (e) could not be evaluated. 
The utility of Q-sort methods of the kind used 
here must be evaluated with caution (a fol- 
lowup is being planned in which the clinical 
and statistical scores will be correlated with 
criterion measures based on the inductees’ 
performance records following three years 
regular Army Service). A further limitation 
which bears noting is the fact that this is the 
first study of its kind, and further research 
into the appropriateness of O sort to other 
clinical situations is obviously needed. 
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Keeping these reservations in mind, this 
study nonetheless strengthens the status of 
the clinical assessment. The results obtained 
suggest that in spite of idiosyncrasies of in- 
terviewing style and content, it is possible 
for interviewers to describe their interviewees 
in comparable and consistent terms. This is 
a rather startling discovery, in view of the 
critical comments which are often directed 
toward clinical methods, a criticism which, 
as a matter of interest, was shared by the 
author at the beginning of the study. Fur- 
thermore, the fact that the clinical scores, as 
developed herein, correlate at a low level with 
biographical and test data suggests that such 
scores might profitably be included as sepa- 
rate test scores in a selection battery. The 
development of such Q-sort-derived measures 
might be a more fruitful line to pursue in test 
development than in the development of con- 
ventional tests. As Cronbach and Meehl 
(1955) have pointed out, the validity of the 
clinician’s constructs deserve a place in psy- 
chological thinking and research. 


Summary 


Linear statistical and clinical models of per- 


sonnel assessment were compared with respect 
to: (a) correlation with interview decisions, 


(b) correlation between models, and (c) 
errors of measurement. 

Eight interviewers assessed from 14 to 50 
Canadian Army applicants using information 
obtained from biographical and test data, 
and from interview conversation. Each ap- 
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plicant was described on a 120-item Q-sort 
check list. These data were quantified and 
combined into composite statistical scores 
(biographical and test data) and clinical 
scores (Q-sort data). 

The results indicated that: (a) clinical 
scores were associated more closely with de- 
cisions than were statistical scores; (6) sta- 
tistical and clinical scores correlated at a low 
level; (c) the decisions of different interview- 
ers were associated with the same (Q-sort, 
biographical, and test data; and (d) sta- 
tistical and clinical scores were con.parable 
in reliability. 

It was concluded that the clinical model of 
assessment was not reducible to the linear sta- 
tistical model and that interviewers’ methods 
were comparable and consistent under the ex- 
perimental conditions used. 
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The study reported here investigates some 
conditions under which groups fail to utilize 
the resources of the most knowledgeable mem- 
ber in a group decision making situation. 
The experimental conditions include the pres- 
ence or absence of a silent recess or incuba- 
tion period imposed during the group discus- 
sion as well as variations in group size. 

In the extensive body of literature concern- 
ing the relative problem solving performance 
of groups and individuals (Kelley & Thibaut, 
1954), it is sometimes maintained that vari- 
ous group factors such as group power struc- 
ture (Torrance, 1954; Ziller, 1955) and pres- 
sures toward uniformity frequently interfere 
with the optimum utilization of the group’s 
resources but particularly the resources of the 
most well informed member. By the expedi- 
ent of including an accomplice with a correct 
answer and a correct problem solving process 
in each experimental group, it was possible to 
design and conduct a more definitive analysis 
of this proposition. 

With regard to group size, the influence of 
the informed member (advocate hereafter) 
was expected to decrease as the size of the 
group increases; for as the size of the group 
increases the probability of an opposed coali- 
tion may be expected to increase. Moreover, 
the number of participants was expected to be 
related inversely to the perceived prominence 
of the advocate’s arguments. 

A recess or incubation period was intro- 
duced as an independent variable in order to 
investigate the correlates of incubation in a 
group setting. Research regarding incubation 
in an individual problem solving situation 
suggests that interferences resulting from emo- 

1 This report is an extension of a paper presented 
at the APA meeting, Washington, D. C., September 
1958. 

2 The authors wish to thank their associates in the 
Fels Group Dynamics Center who provided valuable 
feedback for the revision of the original manuscript 
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tional causes and sets may dissipate under 
conditions of enforced delay by serving to 
interrupt a set and allow competition from 
other sets (Schachter, 1951). The “sleeper 
effect” (Hovland & Weiss, 1951), that is, the 
paradoxical increase in opinion change which 
sometimes appears after a lapse of time fol- 
lowing an induction attempt, lends further 
support to this hypothesis. Thus it was pre- 
dicted that a highly informed group member 
(advocate) is more effective in a group de- 
cision making situation in which an incuba- 
tion period is imposed in comparison with a 
situation in which there is nb such break in 
the group decision making process. 


Method 


One hundred ninety-nine male and fe- 
male University of Delaware summer session stu- 
dents (largely public school teachers) served as Ss. 
The experiment was conducted with social science 
classes during the regular class period. 

Task. The decision making task required the 
group or individual members to estimate the num- 
ber of dots on a slide containing 1,050 black dots 
scattered rather uniformly, yet in no geometric array, 
over a white background in a figure resembling a 
ping-pong paddle framed by a rectangle of minimum 
dimensions. The slide was exposed for only 5 sec. 

Experimental Procedure. Essentially, a 2 X 4 ex- 
perimental design was employed involving four levels 
of group size (two-, three-, four-, and five-person 
groups) and the presence or absence of an incuba- 
tion period. An accomplice, the advocate, was in- 
cluded in each group under the eight experimental 
conditions Thus, for example, the two-person 
groups were actually composed of a naive S and an 
accomplice. 

The advocates or accomplices were selected ran- 
domly from the various classes participating in the 
study and were instructed as to their role on the 
day prior to the experimental session. The method 
which the advocates were asked to employ required 
them to envision the paddle-shaped dot display en- 
closed by the rectangle of minimum dimensions. The 
paddle-shaped figure occupied approximately one- 
half of the area within the rectangle. Thus, the 
number of dots within the figure could be approxi- 
mated by estimating the width and length of the 


Subjects. 
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rectangle in dots and dividing the product by two. 
The advocates were informed as to the actual di- 
mensions (in dot-units) of the rectangle, the arith- 
metic process for approximating the correct answer, 
and the correct number of dots, 1,050. Finally, each 
advocate was asked to attempt to persuade his 
group to decide on the estimate of 1,050 dots or as 
near to that number as possible without, of course, 
revealing his role in the experiment. 

At the outset of the experimental period it was ex- 
plained to the naive Ss that the study was concerned 
with group decision making but particularly con- 
cerned with the relationship between group size and 
the quality of the group decisions. After assigning 
the members to varying sized groups (homogeneous 
with regard to sex) the slide was displayed for 5 sec., 
and the Ss submitted individual estimates as to the 
correct number of dots on the slide. In the next 
phase, the various sized groups dispersed to separate 
rooms to discuss the problem and reach a group de- 
cision in a maximum of 15 min. (an adequate pe- 
riod of time on the basis of previous studies involv- 
ing this task). 

Finally, when the groups had submitted their esti 
mates, the members completed a questionnaire ask- 
ing them to submit an individual estimate as to the 
“number of dots that they personally really thought 
there were on the slide” and to respond to questions 
designed to measure group satisfaction. 

Following the completion of the questionnaire, the 
nature of the study was explained in detail and the 
naive Ss were asked to reveal if they had suspected 
that a number of their group was collaborating with 
E. It became necessary to discard five groups on 
the basis of these responses 

In accordance with the 2 * 4 factorial design in- 
volving incubation and group size, respectively, those 
groups in which a recess or incubation period was 
formalized were instructed that after the first 4 min. 
of discussion the group members were to recess in 
silence for 3 min. and review privately the preceding 
discussion. (In pilot studies it had been found that 
no group completed the problem in less than 4 min.) 
After the 3 min. recess, the groups were instructed 
to resume their discussion and submit a group de- 
cision in a maximum of 11 min. 

Measures. The dependent variables included meas- 
ures of influence and group satisfaction. The four 
measures of influence were: (a) self report, (b) the 
mean of the members’ indices of ‘estimate change, 
(c) group decision error, and (d) mean error of the 
group members’ post-group-decision estimates. 

The first influence measure was derived from the 
group mean of the weighted responses to the follow 
ing item: “I changed my individual estimate of the 
number of dots a great deal as a result of the group 
discussion.” The alternatives were arranged on a 6- 
point scale varying from “agree very much” to “dis- 
agree very much.” This index represents the indi 
vidual member’s perceived change. While it was 
quite possible that this phenomenological measure 
would not correlate highly with some of the other 
behavioral measures, it was included since it reflects, 


403 


at the very least, resistance to persuasion from S’s 
point of view. (Actually the correlation coefficient 
between self-report and “members’ change” was 
found to be 0.50. See Table 1.) 

The second measure of influence was derived from 
the ratio of the difference between the accuracy of 
the individual’s prediscussion and postdiscussion esti- 
mates to the difference between the prediscussion 
estimate and the correct answer. The numerator 
represents the degree of change toward the correct 
answer and the denominator the degree of change 
possible. An upper limit of 1.00 is inherent in the 
index and a lower limit of zero was imposed. The 
resulting ratio was subjected to the arc sine trans 
formation. The mean of the members’ transformed 
indices provided the index of group estimate change 

The third measure of influence was simply the dif- 
ference between the group decision and the correct 
answer (the advocate’s position), and may be inter 
preted as representing the extent to which the group 
resisted the advocate’s arguments. Because of the 
heterogeneity of these scores, it was necessary to ap- 
ply the cube root transformation. 

The fourth measure of group influence was very 
similar to the preceding index but the individual 
group member’s post-group-decision estimate was 
substituted for the group decision. The mean abso- 
lute error of the group members was calculated and 
again the cube root transformation was applied. 

The following items comprised the questionnaire 
regarding group satisfaction: 


1. I was extremely satisfied with the quality or 
degree of excellence of the decisions reached by 
my group. 

Consider the entire problem solving session: My 
opinion was given the utmost consideration. 

3. This has been the most stimulating experience 
I've had in a long time. 

This experience has greatly increased my re- 
spect for the group method of doing things. 

If I could have chosen the other members of 
the group myself, I couldn’t have done a better 
job. 


Again the alternatives were arranged on a 6-point 
scale ranging from “agree very much” to “disagree 
very much.” 
Table 1 
Correlations among Persuasion Measures 
Members’ 


Error of 
Estimate 


Groups’ 
Error of 
Estimate 


Members’ 
Change 


Self-Report 50* 05 07 
Members’ Change 24 41* 
Members’ Error of 


Estimate .61* 


«Pp 
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Table 2 


Group Persuasion (Four Measures) in Relation to Group Size and Incubation 


Incubation Period 
Measures 
Group - 
Size d B 


40 509 5.7 
$6 Bia 88 
34 30.1 7.6 
(7) 46 583 64 43 


Combined Mean* 3.9 


45 69 5.8 


No Incubation Period 


Combined Mean 


Measures Measures 


B ? i B 


54.2 3.5 
48 54.6 5.0 
45 43.4 4.7 
48 66.7 a 4.0 


47 53.9 5. 4.4 


® Results of Analysis of Variance: Incubation effects are significant at the .05 level of confidence with regard to Measures 


A, C, and D (F = 6.07, 4.44, and 4.15, respectively). 
respect to Measure C (F = 2.77). 


Note. 


Size effects are significant at about the .05 level of confidence only with 


In Measures A (self-report) and B (members’ estimate changes) a high score represents greater change or persuasion. 


In Measures C (error of members’ post-group-decision estimate) and D (error of group estimate), a low score represents greater 


change or persuasion 


Results 


The correlation matrix involving the four 
measures of influence is given in Table 1. 
While none of the correlation coefficients are 
high, all four measures were retained on the 
basis of the assumption that the different 


Table 3 
Analysis of Variance of Group Satisfaction 
(Questionnaire) in Relation to Group 
Size and Incubation 


Group Size Com 
bined 
Mean 


Incubation Period ; : } 5. 4.7 
No Incubation Period z - 5 . 4.9 
Combined Mean 5.1 ¢ ; 1.8 


Source of, Variation 
5.00 3.38 
2.53 1.71 
(Question 64.80 43.78 
Size X, licubation 4.88 3.30 
Size X Question 1.18 80 
Incubation X Question 24 16 
Size X Incubation X 

Question 12 65 A4 
Within Groups 160 1.48 


Group Size 


Incubation 


Note.—lIn arriving at the figures in each cell the mean 
weighted responses of the members of each group to the five 
questionnaire items was calculated, and a mean group score 
was determined. Finally, the mean of the five groups (ran 
domly selected) in each cell was calculated 


he score itself is the mean for the groups within a cell. 


measures of influence present various facets 
of the phenomenon under investigation. 

In a 2 X 4 analysis of variance design with 
an unequal number of groups in each cell 
(Snedecor, 1946), the relationships among 
the experimental conditions (incubation-no 
incubation and four variations of group size) 
and the four measures of influence were ana- 
lyzed (see Table 2). 

With regard to self-reports (Measure A), 
the mean error of the member’s post-group- 
decision estimate (Measure C), and group- 
decision error (Measure D), the no-incuba- 
tion condition is associated, contrary to ex- 
pectations, with greater influence or change 
in opinion (p = .05). 

The results with regard to group size were 
statistically significant only with respect to 
the mean error of the group member’s post- 
group-decision estimate (Measure C). Here 
it was seen that in the two-person group (a 
single naive S), the advocate was most effec- 
tive. It was also noted that the five-person 
groups were somewhat more persuaded in 
terms of this criterion. Moreover, with re- 
gard to each of the four measures of influ- 
ence, the two- and five-person groups were 
influenced more than the three- and four- 
person groups. 

The results concerning group satisfaction 
were explored by means of a2 X 4 X 5 analy- 
sis of variance design (Snedecor, 1946). (See 
Table 3.) Here, the variance attributable to 
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group size was significant at the 0.05 level of 
confidence. Inspection of the combined means 
revealed that the members of two- and five- 
person groups expressed greater satisfaction, 
in general, than the members of three- and 
four-person groups. Moreover, the interac- 
tion effect between group size and incubation 
was statistically significant (p = .05). 


Discussion 


The results of the experiment indicate that 
group size and incubation are related to the 
degree to which the group tends to accept 
the suggestions of the most informed group 
member. However, the results with regard to 
group size were in accord with the original 
hypothesis only with reference to two-, three-, 
and four-person groups. While the three- 
and four-person groups in comparison with 
two-person groups tended to be less accurate 
in their estimates, changed their individual 
estimates less, and expressed less satisfaction 
with the group; the five-person groups in com- 
parison with three- and four-person groups 
tended to change more, were more accurate, 
and expressed greater satisfaction. 

The unexpected nature of these results is 
further emphasized by the contradictory re- 
sults of an earlier study (Ziller, 1957) which 
did not employ accomplices but: involved a 
similar dot counting task and two-, three-, 
four-, and five-person groups. In the ref- 
erenced study, the three- and four-person 
groups, in general, expressed greater satisfac- 
tion with their groups than two- and five- 
person groups. Moreover, in this same study, 
the five-person groups submitted less accu 
rate estimates of the correct number of dots 
than other groups. Thus, the results of the 
present study, particularly those results con- 
cerning five-person groups, can scarcely be 
attributed to size effects alone. 

By way of interpretation of these unex- 
pected results it was assumed, for analysis 
purposes, that the advocate in the present 
study was perceived as a deviate when all 
the original estimates of the naive members 
of a group were lower or higher than that of 
the advocate. In the three-, four-, and five- 
person groups, respectively, the advocates 
were actually found to be the deviates (as 
defined above) 47, 40, and 17% of the time. 
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Thus, there appears to be a tendency for the 
advocate in the five-person group to find him- 
self less frequently defending an extreme po- 
sition relative to the other group members. 
Or again, more simply, there exists a greater 
probability that the scale of judgment of the 
larger group includes the judgment of the 
advocate who thereby being perceived as a 
moderate rather than a deviate may find the 
presumed tendency of groups to compromise 
is in his favor. The results with regard to 
satisfaction also would seem to support this 
position. Unfortunately, the limited number 
of groups in the experiment preclides this 
ad hoc analysis. Nevetheless, the results may 
be interpreted as demonstrating again the un- 
usual characteristics of three- and four-per- 
son groups with regard to the probability of 
coalitions (Mills, 1953). 

With regard to the incubation variable, the 
results were diametrically opposed to the 
initial hypothesis. The most knowledgeable 
member more effective under the no- 
incubation condition, and the group mem- 
bers also expressed greater satisfaction with 
the group products and processes under this 
condition. 


was 


Experiments concerning incubation in indi- 
vidual problem solving situations initially 
suggested that interference resulting from 
emotional causes and sets is dissipated in 
time, thus permitting the acceptance of the 
well informed member’s (advocate) argu- 
ments. The “sleeper effect” represented this 
theoretical position. The discrepancy between 
these earlier findings and the findings of the 
present experiment may be explained, in part 
at least, by assuming that since there was 
only a single encounter between the advocate 
and the group in the earlier experiment there 
was less reason for Ss to perceive the recess 
as an opportunity for a private rehearsal of 
arguments in opposition to the advocate. 
Thus, in the present experiment in which the 
recess was followed by further discussion, the 
recess may have served to instigate rather 
than dissipate emotional sets. 


Summary 


The expefiment reported here investigated 
some conditions under which a group fails to 
utilize the resources of the most knowledge- 
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able group member in a decision making 
situation. By the experimental expedient of 
including an accomplice in each group who 
was informed as to the correct answer and a 
correct method of arriving at that answer, it 
was found that two- and five-person groups 
tended to be more accurate and more influ- 
enced, and expressed greater satisfaction with 
their groups than three- and four-person 
groups; and that groups in which no incuba- 
tion period was imposed were also more ac- 
curate and more influenced, and expressed 
greater satisfaction with their groups. 

The results with regard to the five-person 
groups were tentatively attributed to the in- 
creased scale of judgment in larger groups 
which presumably reduces the probability 
that the advocate must defend an extreme 
position relative to the other group members. 
With regard to the incubation effects, it was 
proffered that during the recess the naive Ss 
prepared for a second defense of their opin- 
ions by revitalizing their initial sets and re- 
establishing their initial arguments. 
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Individuals are reluctant to use words or 
phrases of negative emotional tone as descrip- 
tions of self or of others. For this reason, 
respondents may resist forced-choice items 
which present only unfavorable alternatives. 
Some inventories attempt to reduce this re- 
sistance by requesting the choice of the least 
descriptive rather than most descriptive phrase 
for negative items, but there is no evidence 
concerning the effect of this procedure. Since 
negative items continue to be used in rating 
scales and inventories, it is apparently be- 
lieved that such items play an essential role 
in obtaining valid descriptions. The evidence 
on this matter also is inconclusive. 

In a study which compared several forms 
of a forced-choice rating scale (Highland & 
Berkshire, 1951) a form composed entirely 
of favorable items seemed adequate, since 
this form showed low biasability, adequate 
reliability and validity, and was favored by 
the users. Also in the area of ratings, Wherry 
(1951) showed that while raters actually be- 
lieve positively toned items to be more cru- 
cial to job success and more universally ap- 
plicable, these beliefs were only partially sup- 
ported by the validity data in his factor 
analysis of rating item indices. In the area 
of self description, Krug (1958) showed that 
judgments of unfavorable adjectives were as 
reliable as judgments of positive terms, and 
that responses to negative pairs appeared less 
influenced by the relative favorability (PI) 
of the members of a pair. It was suggested 
that S is more highly motivated to make an 
accurate self-description in the case of nega- 
tive pairs, since the risk appears greater. 

The present study employed judgment time 
as the dependent variable in an effort to as- 
sess the effects of (a) the favorable-unfavor- 
able dichotomy, (6) the response S is re- 
quested to make (least or most descriptive), 
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and (c) the degree of PI discrepancy within 
a pair. 


Procedure 


The Ss were 64 senior division men in a college of 
engineering. Their participation in a one-hour in- 
dividual laboratory session was voluntary. 

From a list of 228 adjectives for which the Selec- 
tion Set Preference Index (Krug, 1958) was avail- 
able, 40 forced-choice pairs were constructed. Twenty 
pairs contained two favorable words, the other 20 
being composed of unfavorable words. In each set 
of 20, five pairs contained members identical in PI, 
five had a PI discrepancy of .20 (on a 7.0 scale), 
five a discrepancy of .75, and five a discrepancy of 
1.50. The standard deviation of the PI was matched 
as closely as possible within a pair. 

The S was seated at a table which was isolated 
from E by a large black shield in which a 6 X 10-in. 
flash glass screen was mounted at S’s eye level. The 
pairs of adjectives were projected individually onto 
this screen from a Bell and Howell automatic slide 
projector. The presentation of a pair was preceded 
by a 2-sec. warning light which was the signal for 
S to attend to the screen. The onset of the pair 
started an electronic chronoscope, which was termi- 
nated by S moving one of the two switches mounted 
on the table before him. Moving the right-hand 
switch indicated that the right-hand word was the 
moving the left-hand switch indicated a 
choice of the word appearing on the left. For the 
pairs with PI discrepancies greater than zero, the 
left-hand word had the higher PI in half of the 
cases. The sequence of pairs was randomized; each 
S was assigned one of 16 random sequences 

In addition to instructions designed to familiarize 
S with the apparatus and the sequence of events, the 
following instructions were given relevant to the 
judgmental task: “The research we are conducting 
deals with personality test items. The items will be 
presented to you one at a time and each will con- 
sist of two adjectives. Your task is to decide which 
of the two adjectives is a more accurate description 
of you. This may not always be easy; there may be 
pairs such that both members will be adequate de- 
scriptions of you, and others where you will feel 
that neither are. Nonetheless, you are to make a 
choice in each case. The task is not a speed test, 
and we are not asking for snap judgments. A quick 
response is not better than a slow one. You should 


choice; 
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Table 1 
Summary of Analysis of Variance 


Source dj MS PF Error 


2.996 (a) 


R 1 
Ss within R 62 


4.7860 
1.5977 
D 
F 


3 .2212 

1 
DXF 3 

3 

I 

3 


.0870 
1254 
.0872 
.2202 


1645 


Sim (b) 
2.990 (c) 
5.098*** (d) 
3.618* (b) 
7.567** (c) 


6.687*** (d) 


R.X D 

R X F 

RxDxF 
(b) DX Ss 186 
(c) F X Ss 62 
(d) DX FXSs_ 186 


0241 
.0291 
.0246 


*p = 0S. 
+ = .O1 
+ > = O01. 
treat each item just as you would if you encoun- 
tered it on a printed personality inventory. How- 
ever, we are interested in judgment time, and it is 
important that you respond as soon as your decision 
is reached. Remember, the most important thing is 
that we want to obtain an accurate self-description, 
so please make your choices as seriously as possible.” 
Ss were randomly assigned to one of two response 
conditions. For 32 Ss, the response was in terms of 
the most descriptive adjective. For the remaining 
32, the instructions were modified to request the 
choice of the least descriptive term in each pair.! 


Analysis of Results 


The time in milliseconds was recorded for 
each response, and a log transformation ap- 
plied. The mean log time for each S for the 
five pairs at each treatment-level combina- 
tion provided the basic data for analysis. 

Since each S responded at all levels of PI 
discrepancy (D) for both favorable and un- 
favorable pairs (F) using one of the response 
conditions (R), the appropriate analysis of 
variance design is Lindquist’s mixed design, 
Type VI (Lindquist, 1956, pp. 292-297). 
Table 1 presents the summary of this analy- 
sis. Of the main effects, only D is statisti- 


1 To correspond with the usual inventory-practice, 
least choice should be made to the unfavorable pairs, 
and most choices to the favorable pairs. Pilot work 
indicated that the intrusion of the specific choice ‘in- 
struction just before the pair appeared led to an 
anticipatory set which affected judgment time. The 
procedure actually employed seemed preferable to 
the other alternative of presenting the two sets of 
20 pairs separately. 
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cally significant, all second-order interactions 
are significant at the .05 level or beyond, and 
the R X D X F interaction is significant at 
the .001 level. Figure 1 illustrates this third- 
order interaction; the figure also serves as a 
table of means from which other effects may 
be plotted. 
Discussion 

An implicit assumption of the study was 
that judgment time might be taken as an 
index of S’s willingness to respond to a forced- 
choice item. This assumption is supported 
by the general decrease in judgment time as 
PI discrepancy increases. It must be noted, 
however, that the over-all curve is not a con- 
sistently decreasing one; mean time at the 
third level of D is slightly greater than at the 
second level. On the surface, this suggests 
that the pairs chosen to represent Ds. and Ds 
do not adequately represent the inten led di- 
mension. That this is not the explanation 
will be demonstrated in the discussion of the 
interactions. For the moment, we wish to 
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state simply that the significant effect of D 
indicates that the dependent variable pos- 
sesses relevance and that our results are gen- 
erally consistent’ with the previous finding 
of a relationship between decision time and 
stimulus difference (Festinger, 1943). 

The hypothesis that Ss are reluctant to re- 
spond to pairs containing only negative al- 
ternatives is not supported. Not only is the 
effect of F nonsignificant, but the observed 
difference is in the opposite direction. Past 
evidence concerning such resistance is pri- 
marily anecdotal, based on comments of re- 
viewers and respondents. The data of this 
study suggest that such commentary tells us 
very little about the actual behavior of Ss in 
the forced-choice test situation. 

Mean time for least descriptive choices are 
consistently higher than for most descriptive 
choices, but the difference is-not statistically 
significant.” ‘ In view of this, comment con- 
cerning the practice of asking for differential 
responses on inventories will be deferred until 
the interactions have been considered. 

The interaction which is so evident in Fig. 1 
is a surprising one. Two curves (favorable 


words, least descriptive responses; and unfa- 
vorable words, most descriptive responses) 


show a consistent downward trend. The re- 
maining two curves show a marked increase 
at the third level of PI discrepancy. These 
peaks cannot be attributed to the particular 
sample of pairs, nor to inadequate estimates 
of PI discrepancies, since each sample of pairs 
behaves properly under one of the response 
conditions. Since different Ss are involved 
in the two response conditions, one might in- 
voke this difference to account for the ob- 
served interaction. However, the assignment 
of Ss to form two homogeneous groups with 
such different response characteristics is 
viewed as extremely improbable. It would 
seem that we are faced with an interaction 
which is real rather than. artifactual, but 
which is rather difficult to explain. 

We might ask which, if either, of the sets 

2 Individual Ss use different portions of the time 
scale, and these absolute differences are sufficient to 
negate the apparent difference between the two 
classes of response. It should be noted, however, 
that Ss are not differentially affected by the D and 
F variables; the mean squares for D X Ss, F X Ss, 
and D x F < Ss are of a magnitude which warrants 
the assumption that these reflect pure error. 
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of curves (the peaked ones or the consistently 
decreasing ones) are associated with response 
bias. An analysis of the frequency with which 
the more favorable member of the pair was 
selected at the third level of D failed to give 
a consistent answer.* For unfavorable pairs, 
there is no difference between most and least 
responses (proportions of .65 and .66). For 
favorable pairs, a significantly greater propor- 
tion (.65) chose the more favorable word un- 
der the most condition than under the least 
condition (.53). In other words, the bias 
which is expected for pairs with a PI dis- 
crepancy of .75 characterizes three of the four 
relevant points in Fig. 1. The only condition 
which does not show the expected bias is 
least response, favorable pairs. This point is 
on a consistently decreasing curve. 

The peaked curves are associated with the 
response conditions which prevail on many 
forced-choice inventories. It must be ob- 
served that this finding does not challenge the 
adequacy of such a procedure, provided that 
the intrapair PI discrepancy is zero. In fact, 
the four points of Fig. 1 which represent a 
zero discrepancy support the standard pro- 
cedure. However, if PI matching is inade- 
quate, the introduction of the differential re- 
sponse complicates matters considerably. The 
reason for this complication is not known. One 
might postulate an inhibition factor which in- 
teracts with R and F, and which becomes im- 
portant when pair members are discriminably 
different on a favorability dimension (S sus- 
pects a trick, and reconsiders his response), 
but our data cannot test this. What is dem- 
onstrated is that for two combinations of pair 
favorability and response, there is a level of 
PI discrepancy which is associated with in- 
creased response time. In this study, this 
level approximates one standard deviation of 
the Preference Index.* 


Summary and Conclusions 


1. In general, the time required for S to 
choose one of the words in a forced-choice 


3In this discussion, the reference is to the term 
preferred as a description; ie., the term chosen in 
the most descriptive set, and the term not chosen in 
the least descriptive set. 

#PI sigmas varied from .50 to 1.1; an 
standard deviation would be near .7 
discrepancy at Level 3 


average 
5, which was the 
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pair decreases as the intrapair PI discrepancy 
increases. 

2. Ss use no more time in responding to 
unfavorable pairs than to favorable ones. 
The reputed resistance to unfavorable alterna- 
tives does not appear. 

3. A complex relationship is demonstrated 
between PI discrepancy, the favorability of 
the pair and the type of response which is to 
be made. When S chooses the least descrip- 
ive term of two unfavorable words, or the 
most descriptive of two favorable words, in- 
creased judgment time is associated with an 
intermediate level of intrapair PI discrepancy. 

4. As a means of controlling bias, adequate 
pairing of forced-choice terms may be defined 
as a zero discrepancy within the pair. If a 
forced-choice inventory contains pairs which 
are not adequate by this definition, no satis- 
factory recommendation is available concern- 
ing the type of response to be requested. In 


Robert E. Krug and Doris Northrup 


a limited sense, it would be preferable to use 
one (either most or least) rather than two 
types of response, given variance in PI dis- 
crepancy. The adequate solution is to have 
no such variance. 
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Factors which introduce biases into intelli- 
gence tests have been of concern to psycholo- 
gists for some years. Socioeconomic status 
has been demarcated as one important source 
of such bias. This recognition has stimulated 
efforts to purify existing tests or develop new 
ones which are less subject to such bias. 

The Lowry Reasoning Test Combination ' 
is one such effort. It employs stimulus ma- 
terials common to all status groups, thereby 
eliminating most social status bias. In addi- 
tion, it is brief, easily administered, and mini- 
mizes the verbal aspect. 

The test consists of two sets of 25 ques- 
‘tions. Stimulus items for both sets are drawn 
from constructs which are presumably familiar 
to all individuals beyond childhood in our 
culture, i.e., days of the week, squares, and 
matchsticks. Variance in concept difficulty 
is obtained by altering combinations while 
simultaneousiy maintaining a relatively con- 
stant level of word difficulty. 

The first group of questions involves rea- 
soning problems using the days of the week 
in a variety of ways. They are so phrased 
that there is an increase in difficulty with each 
succeeding question without any concomitant 
increase in the difficulty of the verbal mate- 
rial needed to understand the problems. In 
this way the verbal symbolism remains stated 
in simple words while S’s reasoning ability is 
put to further test. 

Printed directions inform S that: “Each 
answer is a day of the week. Remember that 
Sunday is always the first day of the week.” 


1 Test copyrighted 1956 by Ellsworth Lowry and 


Omer Lucier. Copies are available from the latter, 
1711 Walnut Street, Philadelphia. 
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The days of the week then appear in a num- 
bered, horizontal list so that this information 
is always before S as he works on this series 
of questions. To make sure S understands 
what is required of him, four sample ques- 
tions are given. Two of them are: “If to- 
day were Saturday, what would tomorrow 
be?” “If today were the first day of the 
week, what was yesterday?” The first of the 
nonpractice questions is, “If today were the © 
third day of the week, the day after tomor- 
row will be what day?” The last of these 
questions is, “If the odd days of the week 
came first, in order, then the even days, and 
if then the order were reversed, and if Mon- 
day were the first day of the week, what 
would be the fifth day of the week?” 

The earlier and simpler questions are in- 
tended to serve as learning situations for the 
later and more difficult combinations. The 
later items are almost impossible to solve 
without first attempting to solve all preced- 
ing problems. 

The second group of questions also starts 
with simple problems which progress in diffi- 
culty and require the solution of earlier items. 
This group consists of squares drawn by non- 
touching lines. The S is first shown three 
adjacent, numbered squares, made of simple 
lines in a vertical ladder-like sequence. He is 
asked to imagine that the squares were made 
with matchsticks. ‘‘How many matches must 
be removed so that the square numbered | 
will be entirely gone, but the other two 
squares will remain complete?” Two other 
practice problems based on the same design 
are given. 
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The first and second questions of the 25 
problems depend upon three similarly made, 
but differently arranged, squares. Squares 
numbered 1 and 2 are horizontal with the 
third square directly beneath the second. The 
S is asked, “How many matches must be re- 
moved so that Square Number 2 will be elimi- 
nated—be entirely gone—leaving the other 
two complete,” and “By removing two 
matches, only, which square can be elimi- 
nated.” 

The last seven items are based on a design 
composed of three rows of 11 squares. The 
first row of two squares is depicted atop a 
second row of four, so that Squares 1 and 2 
(the first row) are centered directly above the 
midmost squares (four and five) of the second 
row. The last row consists of five squares 
shown as evenly aligned with the second row, 
save that the last square, Number 11, has no 
block above it since the second row consists 
of only four squares. The last two questions 
ask, “What is the largest sum possible of 
three squares that can be eliminated by re- 
moving three matches?” and “What is the 
smallest sum of three squares that can be 
eliminated by removing three matches?” 
Once again increased difficulty is obtained 
through varying the complexity of the de- 
signs and task while the difficulty of the sym- 
bols remains constant. 

As can be seen, both types of questions 
start with items so simple that a child could 
solve them. They proceed in difficulty so 
that even sophisticated and intelligent adults 
find the last items challenging. Both sub- 
tests are timed. Fifteen minutes are allowed 
for the first group of questions and 20 minutes 
for the second. Even though these time limits 
are provided, the items constitute a power 
test. 

Research with the Lowry test has shown 
that it measures much the same abilities as 
the California Test of Mental Maturity and 
is less influenced by social status bias (Lucier 
& Burnette, 1957). Similar results were 
obtained when the Lowry and Cooperative 
School and College Abilities Test were com- 
pared with each other and in relation to so- 
cial status bias (Lucier & Farley, 1957). 
More recently, it has been used with adults 
in comparison with the general technical score 
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(GTS) of the Army Classification Battery. 
This score is taken frequently as being equiva- 
lent to a measure of general intelligence. A 
correlation of .70 was obtained. ACB sub- 
tests which seemed related to the commonly 
accepted measures of intellectual functioning, 
i.e., reading and vocabulary, arithmetic rea- 
soning, and pattern analysis were also ex- 
amined against the Lowry test. Correlations 
of .66, .63, and .55 respectively were secured. 
The Lowry test was also found to correlate 
45 against ratings of performance of ob- 
server-recorder personnel in the Quartermas- 
ter Corps, whereas the GTS achieved a cor- 
relation of .34. All of these correlations, with 
the exception of the last, were significant at 
the .01 level. The last reached significance 
at the .05 level. It was concluded that the 
Lowry Reasoning Test Combination was more 
efficient in identifying soldiers capable of ob- 
server-recorder duties than were components 
of the ACB which were usually used. The 
Lowry test was also found to be more status 
free than the GTS (Andrews, Lebo, & Lucier, 
1959). 
Summary 


The Lowry Reasoning Test Combination 
has been found to be relatively free of social 
status bias and to measure intellectual func- 
tion. It is easily administered and simply 
scored and does not depend upon a high level 
of verbal ability. Variance in concept diffi- 
culty is obtained by altering combinations of 
constructs while keeping the verbal material 
on a uniformly simple level. Wherever such 
a discriminative and effective selection device 
is needed the present writers would recom- 
mend that the Lowry test be tried. 
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Experimental investigations of problem solv- 
ing have identified several variables which in- 
terfere with performance on problem solving 
tasks, and others which facilitate perform- 
ance. Among the former variables are pre- 
liminary experience with the components of 
a problem in a different context from the 
test problem (Birch & Rabinowitz, 1951), 
failure (Lazarus & Eriksen, 1952), frustra- 
tion (Mohsin, 1954), and rigidity (Luchins, 
1942). Among the latter variables are in- 
structions including the phrase “don’t be 
blind” (Luchins, 1942), praise (Cowen, 
1952), instructions to be “clever” (Christen- 
sen, Guilford, & Wilson, 1957), and train- 
ing in creative problem solving emphasiz- 
ing the brainstorming procedure (Meadow & 
Parnes, 1958). The present paper describes 
the results of an experiment designed to 
evaluate further the effects of the brainstorm- 
ing method on creative problem solving. 

In the brainstorming procedure S segregates 
in time the formation of a solution and the 
judgment of its efficacy or value. The S is 
encouraged to express any possible solution 
which comes to mind during the initial phase, 
postponing the evaluation of the solution to 
a later time (Osborn, 1957). Meadow and 
Parnes (1958) found that Ss who had taken 
a one-semester course in creative problem 
solving, which emphasized the brainstorming 
procedure, were significantly superior on five 
of seven measures of creative ability to a 
group of matched control Ss who had not 
taken the course. This difference in perform- 
ance cannot be unequivocably attributed to 
the difference in training in the brainstorming 
procedure, because experimental and control 
groups also differed in other variables em- 
phasized in the creative problem solving 


1 This research was financed by a grant from the 
Creative Education Foundation. 


course, which involved constant practice with 
problems requiring creative ability. 

The present experiment was designed as a 
test of the effectiveness of the brainstorming 
procedure, using only Ss who were members 
of the course in creative problem solving in 
order to control for the amount of previous 
experience in the various problem solving 
methods. Each S$ was given two problems 
which required creative ability, in two test- 
ing periods. One problem was administered 
under brainstorming instructions, which al- 
lowed Ss to formulate possible solutions with- 
out evaluating them; the other problem was 
administered under nonbrainstorming instruc- 
tions, which required Ss to formulate and 
evaluate solutions simultaneously. The qual- 
ity of the solutions was later evaluated by a 
trained rater. It was expected, on the ba- 
sis of previous experimentation (Meadow & 
Parnes, 1958), that more solutions of good 
quality would be produced under the brain- 
storming instructions than under the non- 
brainstorming instructions. 


Method 


Subjects. The Ss were 32 college students from 
two courses in creative problem solving, one given 
at the University of Buffalo and the other at Mc 
Master University. The experiment was conducted 
during the final two weeks of the semester. The Ss 
were randomly divided into four experimental groups, 
each containing eight Ss. 

Experimental problems. Two problems, the Hanger 
problem and the Broom problem, were selected from 
the AC Test of Creative Ability. Since the AC 
Test is reported to differentiate “creative” from 
“noncreative” Ss (Harris & Simberg, 1954), the two 
problems selected presumably require creative abil- 
ity. The Hanger problem was used in a previous 
experiment, in which it was found that Ss trained 
in brainstorming performed at a significantly higher 
level on the problem than nontrained Ss (Meadow 
& Parnes, 1958). The problem required Ss to list 
other uses for a hanger or a broom. 
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Procedure. The Ss were given first one problem, 
and then the second immediately thereafter. All 
tests were group administered to each of the four 
groups separately. One problem was given under 
brainstorming instructions and the other under non- 
brainstorming instructions. The essentials of the 
brainstorming instructions were as follows: “Brain- 
storm to your fullest ability; forget about quality 
entirely. We are going to count only quantity on 
this test. . . . Quality is of no concern at all.” The 
essentials of the nonbrainstorming instructions were: 
“Forget all about brainstorming. Strive completely 
for quality. We want to see how many good ideas 
you can produce in a certain amount of time. You 
are going to be penalized for any bad ideas. Any 
ideas rated as poor will be subtracted from your 
score... .” The Ss were allowed § min. for each 
problem. 

Half of the Ss were given first the Hanger prob- 
lem and then the Broom problem; the other half 
were given the problems in the reverse order. Simi- 
larly, half of the Ss were given first the brainstorm- 
ing instructions and then the nonbrainstorming in- 
structions, and the other half were given the instruc- 
tions in the reverse order. The design is illustrated 
in Table 1. The three experimental variables, In- 
structions, Problems, and Test Periods (first and 
second), were all within Ss factors, and Lindquist’s 
Type V analysis cf variance (Lindquist, 1953) was 
used for statistical analysis. 

Ratings. Each response was copied onto a sepa- 
rate slip of paper, given a code number, and then 
presented to the rater for evaluation; hence the rater 
was never aware of whether he was scoring a re- 
sponse produced under the brainstorming or the 
nonbrainstorming instructions. The rater was in- 
structed to rate each response on 3-point scales of 
(a) uniqueness—the degree to which the response 
departed from the conventional use of the object, 
and (b) value—the degree to which the response 
was judged to have social, economic, aesthetic, or 
other usefulness. The rater had previously used 
these ‘scales in connection with other research 
(Meadow & Parnes, 1958). Any response which 
duplicated in essential meaning any other response 
was eliminated from the scoring. A response was 


Table 1 


Experimental Design 


Second Test Period 


First Test Period 


Instruc- 


Instruc- 
tions 


Problem tions 


Problem 


Group 
Broom 
Hanger 
Broom 


Hanger 


B Hanger NB 
B Broom NB 
NB Hanger B 
NB Broom B 





Note.—B refers to brainstorming; NB refers to nonbrain- 
storming. 
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Table 2 


Mean Numbers of Good Solutions 





Test Periods 


Both 


Tests 


Second 
Test 


Instructions: 


6.88 
4.50 


7.94 
3.88 


Problems: 
Hanger 
Broom 


7.62 
4.62 


5.06 * 
6.31 5.47 


Note.—B refers to brainstorming; NB refers to nonbrain- 
storming. 


designated “good” if the combined uniqueness and 
value score was 5 or 6. The performance measures 
were the number of good responses and the number 
of good responses expressed as a percentage of the 
total number of responses. 

The inter-rater reliability for the Broom problem, 
for ratings of 30 Ss selected at random, was .91. 
The inter-rater reliability for the Hanger problem 
was not determined for the present data, but had 
been found to be .74 in a previous experiment in- 
volving the same raters (Meadow & Parnes, 1958) .2 


Results 


The mean numbers of good solutions pro- 
duced under the two instructions, for the first 
and second test periods separately, are pre- 
sented in Table 2. These data show that 
more good solutions were produced under the 
brainstorming instructions than under the 
nonbrainstorming instructions, and that this 
effect was greater in the first test period than 
in the second. These effects were statisti- 
cally significant, as indicated by the signifi- 
cant main effect of Instructions and the sig- 
nificant interaction between Instructions and 
Test Periods (see Table 3). The simple ef- 
fects involved in this interaction were ana- 
lyzed by means of the Cochran-Cox approxi- 
mate ¢ test (Cochran & Cox, 1950, pp. 92- 
93). Significantly more good solutions were 
produced under the brainstorming instruc- 
tions when they were given first than when 


Scores on the Hanger problem in the previous 
experiment correlated with other creative ability tests 
as follows: Guilford’s Unusual Uses, .473; Guilford’s 
Plot Titles High, .452; Guilford’s Apparatus, 301; 
TAT Originality, .520. All but the Apparatus cor- 
relation were at the .01 level of significance. The 
correlation with the Apparatus test was significant 
at the .05 level (Meadow & Parnes, 1958). 





Brainstorming Instructions and Problem Sequence 


they followed nonbrainstorming instructions 
(¢ = 2.30, df = 28, OS<p>.02). There 
was no significant difference in performance 
under nonbrainstorming instructions in the 
first test period as against the second (¢ = 
1.36, df = 28, p> .10). 8. 

The mean numbers of good solutions pro- 
duced on the Hanger and Broom problems 
for the first and second test periods are given 
in Table 2. These data suggest that there 
were more good solutions of the Hanger prob- 
lem than the Broom problem in the first test 
period, but that this trend was reversed in 
the second test period. This effect was sta- 
tistically significant, as indicated by the sig- 
nificant Problems by Test Periods interaction 
in Table 3. The Cochran-Cox approximate 
t test was used to analyze the simple effects 
involved in this interaction. For the Hanger 
problem the difference between performance 
in the first and second test periods was sta- 
tistically significant (¢ = 2.78, df = 28, p< 
.01), but for the Broom problem the differ- 
ence between the first and second test periods 
was not significant (¢ = 1.84, df = 28, .10 < 
p > .05). 

The over-all difference between the first 
and second test periods and the over-all dif- 
ference between the Hanger and Broom prob- 
lems were not significant. 


Table 3 


Summary of Analysis of Variance of Absolute 
Number of Good Solutions 


Source dj 


Between Ss 31 
cx? 
Ix T 
PXT 

Error (b) 

Within Ss 
Instructions (I) 264.06 
12.25 
3.06 
15.99 
4.59 


Problems (P) 
Test Periods (T) 
[ixPxt 


Error (w 


Total 





Note.—lInstructions refer to Brainstorming vs. Nonbrain- 
storming; Problems refer to Hanger vs. Broom; Test Periods 
refer to the first test administration vs. the second adminis- 
tration. 


Table 4 
Summary of Analysis of Variance of 


First Test Period Only 


Source df MS 


Brainstorming— 
Nonbrainstorming 235.00 
72.00 
8.00 


8.39 


Broom—Hanger 
Interaction 
w-cells 


Total 


Since both Instructions and Problems inter- 
acted with Test Periods, the data for the first 
test period only were analyzed separately, by 
means of a factorial analysis of variance 
(Lindquist, 1953), summarized in Table 4. 
The main effects of Instructions and of Prob- 
lems were both statistically significant, indi- 
cating that more good solutions were pro- 
duced in the first test period under the 
brainstorming instructions than under the 
nonbrainstorming instructions, and that there 
were more good solutions of the Hanger prob- 
lem than the Broom problem. 


Discussion 


The present experiment tests the assump- 
tion that the brainstorming method leads to 
an increase in creativity on certain creative 
thinking tasks. The findings support this as- 
sumption to the extent that our measure of 
creativity is a valid one. The E’s concept of 
creativity, defined by the rating scale, has 
face validity. More importantly, it has con- 
current validity, in that it correlates signifi- 
cantly (though not highly) with other meas- 
ures of creative ability. 

Although the principal training metked uti- 
lized in the previous experiment (Meadow 
& Parnes, 1958) was “brainstorming,” other 
supplementary training methods described by 
Osborn (1957) were also employed. The re- 
sults of the present experimemt™ accordingly 
provide a more decisive test of the efficacy of 
the brainstorming method per se. 

One point should be emphasized with re- 
spect to the interpretation of both experi- 
ments. Trained Ss were utilized in each 
study. An experiment designed to evaluate 
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the effects of brainstorming with untrained 
Ss is now in progress. 


Summary 


The experiment was designed to study the 
effects on ‘creative problem solving of instruc- 
tions to express solutions without evaluation 
(brainstorming) and instructions which re- 
quired only solutions of good quality and 
which involved a penalty for solutions of bad 
quality (nonbrainstorming). Each S was 
given two problems which required creative 
ability, in two testing periods. One problem 
was administered under brainstorming in- 
structions} the other problem was adminis- 
tered under nonbrainstorming instructions. 
The quality of the solutions was later evalu- 
ated by a trained rater. 

The major findings of the experiment were 
the following. (a) Significantly more good 
solutions were produced under the brain- 
storming jinstructions than under the non- 
brainstorming instructions. (6) Significantly 
more good solutions were produced under the 
brainstorming instructions when they were 
given first than when they followed nonbrain- 
storming instructions. There was no signifi- 


cant difference in the nonbrainstorming per- 
formance in the two test periods. 


Parnes, and Hayne Reese 
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Some time ago a study was made of the 
consistency of output of a group of machine 
operators (Rothe, 1947). That study re- 
vealed a relative lack of consistency of pro- 
duction from one two-week period to the next. 
The operators were paid on a straight hourly 
rate, and not by any type of incentive system. 

Other studies of the output of various 
groups of industrial operators showed, in gen- 
eral, rather low consistency of performance 
when a day rate pay system was in effect, 
and a high consistency when an incentive pay 
system was used (Rothe: 1946, 1951; Rothe 
& Nye, 1958). These studies also indicated 


another difference that seemed to vary ac- 
cording to the method of payment in effect; 
namely, under an incentive system the aver- 
age output of a group of operators over a pe- 
riod of time showed greater variability than 
did the output of a single operator observed 


over a period of time; and, conversely, with- 
out an incentive system, the output of a sin- 
gle operator observed over a period of time 
showed greater variability than did the aver- 
age output of a group of operators over a 
period of time. 

These observations led to the formulation 
of two hypotheses relating output consistency 
to the adequacy of the incentives in operation. 
The hypotheses are stated here as follows: 
(a) “the incentives to work may be consid- 
ered ineffective when the ratio of the range 
of intra-individual differences is greater than 
the ratio of the range of interindividual dif- 
ferences” (Rothe, 1946) and (0) if the inter- 
correlation of output rates for two periods 
closely related in time is less than .70, the 
incentivation is not highly effective, while in- 
tercorrelation higher than .80 indicates effec- 
tive incentivation (Rothe & Nye, 1958). 

These hypotheses have never been -tested 
experimentally. It is quite likely that they 
never will be, since an experiment seems al- 
most doomed to remove some of the aspects 


of a normal industrial situation. The best 
check of the validity of these hypotheses ap- 
parently is to make study after study of 
workers at their work places to see if these 
observations continue to hold true 

The purpose of this paper is to report an- 
other study of industrial machine operators, 
working under different conditions of financial 
incentivation. In the present instance it is 
necessary to keep anonymous the name of the 
plant involved. It shall be called Plant B. 
Where reference is made to the earlier study 
of machine operators, the name Plant A will 
be used in this paper. 


Background of the Study 


Plant A, previously reported, involved 130 men 
over a six-week period. The men were paid an 
hourly rate (nonincentive) but, since standards ex- 
isted for each job, it was possible to compare the 
output of men on various types of machines and 
operations. The plant was located in a Wisconsin 
city of between 5- and 6,000 persons, and Plant A 
was by far the largest industry in the city 

Plant B is also located in Wisconsin, again in a 
city of about 6,000 persons. It is one of several 
manufacturing plants in town, no one of which is 
relatively as large as was Plant A in its town. Plant 
B has several hundred employees in its machine 
shops. Their work is generally similar to the work 
of the employees of Piant A. One outstanding dif- 
ference is that the employees in Plant B are paid 
according to an incentive system. Data for 42 men 
over a period of 10 weeks in 1958 were taken from 
the factory records. 

The most readily observable differences between 
the two plants were: (a) Although both plants were 
located in small cities, Plant A was by far the largest 
establishment in its city, while Plant B was one of 
several approximately equal sized plants in its city; 
(b) Plant A was mainly locally owned, while Plant 
B is a branch plant of a company whose main offices 
are elsewhere; (c) Plant A paid by an hourly rate 
while Plant B paid according to a financial incen- 
tive system. 

One other interesting observation should be made 
At the time of the study in Plant A (1946) there 
were acute shortages of materials for most plants, 
including Plant A. At the time of this later study 
there was a period of economic adjustment, or re 
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Table 1 
Weekly Average Output (Percentage Performance of 
Standard) for Grouplof Machine Operators 


—— eee 





Number of 
Employees 


Percentage 
Week Ending Performance 
February,24 122.6 42 
March 3 120.9 39 
10 127.4 33 
17 125.7 32 
24 127.9 
122.8 
124.9 34 
126.8 28 
121.9 33 
122.9 26 





cession, and business was generally not good. Plant 
B felt the effects of this condition and was laying 
off employees at the time this study was made. 
Thus it will be noted in the Tables in this paper 
that Plant B does. not show 42 men for each week. 
Some men were being laid off, and some were re- 
called, during the period of the study. 


Data 


The weekly average output for the group, 
and the number of employees whose output 
was used in this study, are shown in Table 1. 
Inspection of this table shows the percentage 
of performance to standard was quite stable 
but the number of employees generally was 
smaller in each successive week. The per- 


Table 2 


Frequency Distributions of Weekly Average Output 
of All Operators for Two Selected Weeks 





No. of 
Employees 
“Most” Week 


No. of 
Employees 
“Least” Week 


Percentage 
Performance 
to Standard 


155 


ee a a 
sxveoocreW Ww 
aAawMmanaoanaonoans aS 
Oo = Wm eh OO me | 

= 


~)) 


Below 64 


nN 


| 


Note.—*® = median, 


formance was at a high level, since 100% 
was standard. This means that the group of 
employees averaged about 25% pay above 
their base rates. It indicates that the incen- 
tive system was effective since it apparently 
‘‘incentivated” the employees to produce more 
than what management considered to be 
standard production. 

The distribution of output of all operators 
for any one week was somewhat skewed al- 
though they approximated a normal distribu- 
tion. (The output of the various other groups 
of industrial workers reported in previous 
studies was normally distributed.) To show 
the shape of the distribution in this study, 
the outputs for the week when theré were the 
most employees and the week when there 
were the least employees were selected. The 


Table 3 


Frequency Distribution of r’s between Successive 
Week’s Output. Individual Performance 


for Group of Machine Operators 


r Frequency 


.91-1.00 
.81-— .90 
.71- .80 
.61— .70 
51- .60 


Note.—Median r = .78. 


distributions of output for these two weeks 
are shown in Table 2. It is interesting to 
note that the medians for both weeks were 
the same, although in one week there were 42 
employees and in the other week only 26. 

The correlation of each employee’s per- 
formance for one week with his performance 
for the following week was determined by the 
Pearsonian r even though the distributions 
were slightly skewed. This was done in or- 
der to facilitate comparisons with the earlier 
studies. The distribution of these r’s is shown 
in Table 3 where the median r is .78. The 
range of r’s was much smaller than for the 
group of coil winders, previously reported 
(Rothe & Nye, 1958), and was approximately 
the same as for the chocolate dippers (Rothe, 
1951). 

This r of .78 does not quite meet the r of 
80 that has been hypothesized to indicate 





Output Rates among Machine Operators 


Table 4 


Highest and Lowest Average Weekly Performances, and Their Ratios for Individual Machine Operators 
during 10-Week Period 





Ratio of 
Highest 


Lowest 
Weekly 
Average 


Highest 
Weekly 
Average 


Employee to Lowest 


130.3 
139.6 
138.8 
140.3 
145.5 
146.9 
144.6 
154.8 
146.3 
146.3 


87.4 
101.2 
100.0 
105.7 
127.3 
121.8 
116.6 
141.8 
114.5 
123.0 

148.0 140.0 
156.0 102.0 
142.7 129.2 
141.5 85.1 
147.3 107.3 
142.2 137.5 
145.0 103.0 
R 118.3 67.0 


Note.— Median intra-individual ratio = 1.29. 


effective incentivation, but is obviously so 
close that the difference is statistically insig- 
nificant. 

The output for the most productive and 
least productive weeks for each employee for 
any one of the 10 weeks, and the ratio of 
highest to lowest performance, are shown in 
Table 4. The average (median) ratio of 
these intra-individual differences is 1.29. 

The ratio of best operator to worst opera- 
tor for each week is shown in. Table 5 where 
the average (median) ratio of interindividual 
differences is 2.89. Thus the second hypo- 
thetical requirement for effective incentivation 
has been met in this situation, since the ratio 
for interindividual differences exceeds the 
ratio for intra-individual differences. 


Discussion 


Two hypotheses were presented in earlier 
papers, and data from another factory, Plant 
B, were related to these hypotheses. In gen- 
eral, the results obtained in Plant B support 
the two hypotheses. This is interpreted as 
strengthening, but not proving, the hy- 
potheses. 


Lowest Ratio of 
Weekly 
Average 


Highest 
Weekly 
Average 


Highest 

Employee to Lowest 

S 135.8 

4 109.4 

U 133.7 

V 138.1 
140.1 
144.9 
147.2 
143.1 
139.2 
66.9 
113.7 
139.5 
147.8 
129.6 
139.9 
143.5 
137.8 
148.5 


76.1 
53.1 
119.0 
95.9 
135.0 
113.8 
105.3 
139.1 
110.0 
39.3 
47.3 
95.9 
142.4 
70.7 
107.7 
99.5 
112.0 
146.2 


1.78 
2.06 
1.12 
1.44 
1.04 
1.27 
1.40 
1.03 
1.27 
1.70 
2.40 
1.45 
1.04 
1.83 
1.30 
1.44 
1.23 
1.02 


The week-to-week intercorrelation of out- 
put rates meets (for all practical purposes) 
the criterion of .80 which supposedly indi- 
cates that the incentives to work are effective. 
The average weekly output was about 125% 


Table 5 


Highest and Lowest Average Individual Weekly Per- 
formances and Their Ratios for Group of 
Machine Operators during 
10-Week Period 


Ratio of 
Highest 
to Lowest 


Lowest 
Employee’s 
Average 


Highest 
Employee’s 
Week Ending Average 
39.6 3.96 
39.3 3.77 
10 148.0 58.6 52 
17 147.9 64.8 28 
24 151.1 53.1 84 
31 147.7 47.3 3.12 
147 61.1 2.37 
14 147.9 50.4 
21 148.5 42.5 
28 156.0 55.2 


February 24 148.8 
March 3 148.1 


Note.— Median interindividual ratio = 2.89, 
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of standard and this suggests that the incen- 
tives did indeed incentivate. 

The ratio of interindividual differences ex- 
ceeded the ratio of intra-individual differ- 
ences, and this also has been hypothesized as 
existing where the incentives are effective. 

In this situation again, as in the previous 
studies, the week-to-week intercorrelation of 
output rates is low when viewed from the 
standpoint of using production data as a Cri- 
terion for|some other variable. Psychologists 
would not) be impressed greatly by a test that 
had a test-retest reliability of .80, but the cri- 
terion against which they often validate their 


Harold F. Rothe and Charles Ty Nye 


tests is rarely this high. This results in the 
unique situation whereby the tests that are 
used are more stable measures than the vari- 
able that the tests are intended to predict. 
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A FURTHER INVESTIGATION OF THE PRETEST- 
TREATMENT INTERACTION EFFECT’ 


ROBERT E. LANA 


American 


It has previously been demonstrated (a) 
that a pretest-treatment interaction effect 
does not occur in the pretest-treatment-post- 
test attitude change research design when the 
attitude involved is of little concern to Ss 
(attitude of nonagricultural college students 
toward vivisection at a time when there were 
no newspaper campaigns on the matter and 
before the announcement by the Russians 
that they had launched a dog in a rocket). 
The purpose of this study is to examine this 
basic attitude change research design for a 
pretest-treatment interaction effect when the 
attitude in question is one of somewhat 
greater concern to Ss than the topic of vivi- 
section. It was inferred that attitude toward 
ethnic groups would fulfill this requirement 
on the basis that school integration was a 
salient topic of discussion at the time in the 
newspapers and in talks given on campus. 
The Ss were largely inhabitants of southern 
border states, and the city in which they at- 
tended college had integrated its public 
schools just a few years previously. Conse- 
quently, attitude toward vivisection on the 
one hand and attitude toward ethnic groups 
on the other represent some distance on a 
continuum of involvement for Ss used in the 
study and may differentially affect the rela- 
tionship between an attitudinal pretest and a 
persuasive treatment of some kind. A dis- 
cussion of the importance of this effect for 
attitude change methodology is included in 
the study mentioned above (a). 


Method 


Two hundred and twenty-four students in four in- 
troductory classes at The American University served 
as Ss in the experiment. These groups were ran- 
domly to four treatment conditions pre 
sented in Table 1. Two of these groups received a 
modified form of the California Ethnocentrism Scale 
consisting of 20 Likert-type items as the pretest atti- 


assigned 


1 Taken from a paper read at the 1959 Eastern 
Psychological Association convention in Atlantic 
City, N. J. 


University 


tude questionnaire. A high score represented high 
ethnocentrism. One of these two groups viewed the 
mental health film (6) on ethnic prejudice, “High 
Wall,” 12 days after taking the pretest. After treat- 
ment, this group (Group I) was immediately post- 
tested with the same questionnaire that served as 
the pretest. The other group (Group IV) was sim- 
ply posttested 12 days later. Group II viewed the 
film and was posttested immediately afterward with- 
out having been pretested. Group III answered the 
questionnaire once. Two other groups, which were 
not included in the experimental design and which 
had a total N of 100, were simply pretested in order 
to examine the comparability of S’s initial attitudes 
on ethnocentrism. For an examination of the in- 
teraction hypothesis a factorial analysis of variance 
was used with .05 as the acceptable level of signifi- 
cance. 


Results 


The four groups of pretest scores, including 
Groups I and IV of the experimental design 
and the two groups not part of this design, 
were submitted to a Bartlett’s Test and found 
to be homogeneous with respect to variance. 
A simple analysis of variance was then per- 
formed on the four pretest means. The re- 
sulting F ratio was not significant at the .05 
level. The Ss were judged to have the same 
initial ethnocentric attitudes in each of the 
groups on the strength of this evidence. 

The variances of the two sets of pretest 
scores of the experimental groups receiving a 
pretest were examined with Bartlett’s Test 
and found to be homogeneous. A ¢ test be- 
tween the means of these two sets of pretest 


Table 1 
Experimental Design 
Groups 
I] IV 


Pretest 
12 days 


Pretest 


Condi- 12 days 


tions Treatment Treatment 


Posttest Posttest Posttest Posttest 
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Table 2 


Summary of Means and Standard Deviations 
of Posttest Scores 


Gr rT S M 

| 

I | 
(Pretest - 
communication) 


‘ | 
(No prete§t and 
communication) 


Ill 
(No pretest and no 
communhication) 
' 
IV | 
(Pretest ahd no 
communication) 


scores wa$ not significant at the .05 level. 
The conclusion is drawn that the groups re- 
ceiving the pretest in the experiment were 
initially homogeneous with respect to attitude 
toward ethnic groups. 

Means 4nd standard deviations of the post- 
test scores appear below. A Bartlett’s Test 


was then applied to the four posttest results 


and the resulting chi square was not signifi- 
cant. A factorial analysis of variance was 
then performed on the posttest means for the 
four groups. A summary of the analysis of 
variance results appears in Table 3. The F 
ratio for the treatment effect was significant 
at the .05 level which implies that the film 
was successful in changing opinion about eth- 
nocentrism. The interaction effect between 
questionnaire and film was not significant. 
Consequently, the pretest and treatment did 
not interact, which implies that the pretest 
did not sensitize Ss to the communication 
even though the topic involved was of rela- 
tive importance to them. 


Table 3 

Analysis of Variance on Posttest Scores 
Source df SS MS 

61.23 


4.52 
3.18 + 


61.23 16.69 
4.52 1.20 
3.18 <1 


Treatment 

Pretest 

T X P (Inter- 
action) 

Error® 21 3.68 


® Error term was computed by the Walker and Lev 
approximation method for unequal Ns. 


simple 


Discussion 


Since the F ratio representing the interac- 
tion effect of pretesting and treatment was 
insignificant, it can be concluded that the act 
of pretesting a group of Ss with a question- 
naire does not influence their subsequent re- 
actions to a persuasive appeal in terms of atti- 
tude change toward a topic of some impor- 
tance to them. Apparently, an attitudinal 
pretest has no effect on the reception of a 
succeeding persuasive communication within 
the limits of involvement of S with the topi- 
cal continuum represented by vivisection (a) 
at one point and ethnic prejudice at another. 
Perhaps the interaction effect between pretest 
and treatment occurs only when the com- 
munication presents information to S such 
that a learning situation is involved rather 
than a change of attitude. This contention 
remains to be experimentally determined. 
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